feat: openclaw-style secrets (env.vars + \) and per-task model routing
- Replace python-dotenv with config.json env.vars block + \ substitution - Add models section for per-task model routing (heartbeat, subagent, default) - Heartbeat/subagent tasks can use different models/providers than main chat - Remove python-dotenv from dependencies - Update all docs to reflect new config approach - Reorganize docs into project/ and research/ subdirectories
This commit is contained in:
237
docs/research/Openclaw deep dive.md
Normal file
237
docs/research/Openclaw deep dive.md
Normal file
@@ -0,0 +1,237 @@
|
||||
|
||||
# OpenClaw Architecture Deep Dive
|
||||
|
||||
## What is OpenClaw?
|
||||
|
||||
OpenClaw is an open source AI assistant created by Peter Steinberger (founder of PSP PDF kit) that gained 100,000 GitHub stars in 3 days - one of the fastest growing repositories in GitHub history.
|
||||
|
||||
**Technical Definition:** An agent runtime with a gateway in front of it.
|
||||
|
||||
Despite viral stories of agents calling owners at 3am, texting people's wives autonomously, and browsing Twitter overnight, OpenClaw isn't sentient. It's elegant event-driven engineering.
|
||||
|
||||
## Core Architecture
|
||||
|
||||
### The Gateway
|
||||
- Long-running process on your machine
|
||||
- Constantly accepts connections from messaging apps (WhatsApp, Telegram, Discord, iMessage, Slack)
|
||||
- Routes messages to AI agents
|
||||
- **Doesn't think, reason, or decide** - only accepts inputs and routes them
|
||||
|
||||
### The Agent Runtime
|
||||
- Processes events from the queue
|
||||
- Executes actions using available tools
|
||||
- Has deep system access: shell commands, file operations, browser control
|
||||
|
||||
### State Persistence
|
||||
- Memory stored as local markdown files
|
||||
- Includes preferences, conversation history, context from previous sessions
|
||||
- Agent "remembers" by reading these files on each wake-up
|
||||
- Not real-time learning - just file reading
|
||||
|
||||
### The Event Loop
|
||||
All events enter a queue → Queue gets processed → Agents execute → State persists → Loop continues
|
||||
|
||||
## The Five Input Types
|
||||
|
||||
### 1. Messages (Human Input)
|
||||
**How it works:**
|
||||
- You send text via WhatsApp, iMessage, or Slack
|
||||
- Gateway receives and routes to agent
|
||||
- Agent responds
|
||||
|
||||
**Key details:**
|
||||
- Sessions are per-channel (WhatsApp and Slack are separate contexts)
|
||||
- Multiple requests queue up and process in order
|
||||
- No jumbled responses - finishes one thought before moving to next
|
||||
|
||||
### 2. Heartbeats (Timer Events)
|
||||
**How it works:**
|
||||
- Timer fires at regular intervals (default: every 30 minutes)
|
||||
- Gateway schedules an agent turn with a preconfigured prompt
|
||||
- Agent responds to instructions like "Check inbox for urgent items" or "Review calendar"
|
||||
|
||||
**Key details:**
|
||||
- Configurable interval, prompt, and active hours
|
||||
- If nothing urgent: agent returns `heartbeat_okay` token (suppressed from user)
|
||||
- If something urgent: you get a ping
|
||||
- **This is the secret sauce** - makes OpenClaw feel proactive
|
||||
|
||||
**Example prompts:**
|
||||
- "Check my inbox for anything urgent"
|
||||
- "Review my calendar"
|
||||
- "Look for overdue tasks"
|
||||
|
||||
### 3. Cron Jobs (Scheduled Events)
|
||||
**How it works:**
|
||||
- More control than heartbeats
|
||||
- Specify exact timing and custom instructions
|
||||
- When time hits, event fires and prompt sent to agent
|
||||
|
||||
**Examples:**
|
||||
- 9am daily: "Check email and flag anything urgent"
|
||||
- Every Monday 3pm: "Review calendar for the week and remind me of conflicts"
|
||||
- Midnight: "Browse my Twitter feed and save interesting posts"
|
||||
- 8am: "Text wife good morning"
|
||||
- 10pm: "Text wife good night"
|
||||
|
||||
**Real example:** The viral story of agent texting someone's wife was just cron jobs firing at scheduled times. Agent wasn't deciding - it was responding to scheduled prompts.
|
||||
|
||||
### 4. Hooks (Internal State Changes)
|
||||
**How it works:**
|
||||
- System itself triggers these events
|
||||
- Event-driven development pattern
|
||||
|
||||
**Types:**
|
||||
- Gateway startup → fires hook
|
||||
- Agent begins task → fires hook
|
||||
- Stop command issued → fires hook
|
||||
|
||||
**Purpose:**
|
||||
- Save memory on reset
|
||||
- Run setup instructions on startup
|
||||
- Modify context before agent runs
|
||||
- Self-management
|
||||
|
||||
### 5. Webhooks (External System Events)
|
||||
**How it works:**
|
||||
- External systems notify OpenClaw of events
|
||||
- Agent responds to entire digital life
|
||||
|
||||
**Examples:**
|
||||
- Email hits inbox → webhook fires → agent processes
|
||||
- Slack reaction → webhook fires → agent responds
|
||||
- Jira ticket created → webhook fires → agent researches
|
||||
- GitHub event → webhook fires → agent acts
|
||||
- Calendar event approaches → webhook fires → agent reminds
|
||||
|
||||
**Supported integrations:** Slack, Discord, GitHub, and basically anything with webhook support
|
||||
|
||||
### Bonus: Agent-to-Agent Messaging
|
||||
**How it works:**
|
||||
- Multi-agent setups with isolated workspaces
|
||||
- Agents pass messages between each other
|
||||
- Each agent has different profile/specialization
|
||||
|
||||
**Example:**
|
||||
- Research Agent finishes gathering info
|
||||
- Queues up work for Writing Agent
|
||||
- Writing Agent processes and produces output
|
||||
|
||||
**Reality:** Looks like collaboration, but it's just messages entering queues
|
||||
|
||||
## Why It Feels Alive
|
||||
|
||||
The combination creates an illusion of autonomy:
|
||||
|
||||
**Time** (heartbeats, crons) → **Events** → **Queue** → **Agent Execution** → **State Persistence** → **Loop**
|
||||
|
||||
### The 3am Phone Call Example
|
||||
|
||||
**What it looked like:**
|
||||
- Agent autonomously decided to get phone number
|
||||
- Agent decided to call owner
|
||||
- Agent waited until 3am to execute
|
||||
|
||||
**What actually happened:**
|
||||
1. Some event fired (cron or heartbeat) - exact configuration unknown
|
||||
2. Event entered queue
|
||||
3. Agent processed with available tools and instructions
|
||||
4. Agent acquired Twilio phone number
|
||||
5. Agent made the call
|
||||
6. Owner didn't ask in the moment, but behavior was enabled in setup
|
||||
|
||||
**Key insight:** Nothing was thinking overnight. Nothing was deciding. Time produced event → Event kicked off agent → Agent followed instructions.
|
||||
|
||||
## The Complete Event Flow
|
||||
|
||||
**Event Sources:**
|
||||
- Time creates events (heartbeats, crons)
|
||||
- Humans create events (messages)
|
||||
- External systems create events (webhooks)
|
||||
- Internal state creates events (hooks)
|
||||
- Agents create events for other agents
|
||||
|
||||
**Processing:**
|
||||
All events → Enter queue → Queue processed → Agents execute → State persists → Loop continues
|
||||
|
||||
**Memory:**
|
||||
- Stored in local markdown files
|
||||
- Agent reads on wake-up
|
||||
- Remembers previous conversations
|
||||
- Not learning - just reading files you could open in text editor
|
||||
|
||||
## Security Concerns
|
||||
|
||||
### The Analysis
|
||||
Cisco's security team analyzed OpenClaw ecosystem:
|
||||
- 31,000 available skills examined
|
||||
- 26% contain at least one vulnerability
|
||||
- Called it "a security nightmare"
|
||||
|
||||
### Why It's Risky
|
||||
OpenClaw has deep system access:
|
||||
- Run shell commands
|
||||
- Read and write files
|
||||
- Execute scripts
|
||||
- Control browser
|
||||
|
||||
### Specific Risks
|
||||
1. **Prompt injection** through emails or documents
|
||||
2. **Malicious skills** in marketplace
|
||||
3. **Credential exposure**
|
||||
4. **Command misinterpretation** that deletes unintended files
|
||||
|
||||
### OpenClaw's Own Warning
|
||||
Documentation states: "There's no perfectly secure setup"
|
||||
|
||||
### Mitigation Strategies
|
||||
- Run on secondary machine
|
||||
- Use isolated accounts
|
||||
- Limit enabled skills
|
||||
- Monitor logs actively
|
||||
- Use Railway's one-click deployment (runs in isolated container)
|
||||
|
||||
## Key Architectural Takeaways
|
||||
|
||||
### The Four Components
|
||||
1. **Time** that produces events
|
||||
2. **Events** that trigger agents
|
||||
3. **State** that persists across interactions
|
||||
4. **Loop** that keeps processing
|
||||
|
||||
### Building Your Own
|
||||
You don't need OpenClaw specifically. You need:
|
||||
- Event scheduling mechanism
|
||||
- Queue system
|
||||
- LLM for processing
|
||||
- State persistence layer
|
||||
|
||||
### The Pattern
|
||||
This architecture will appear everywhere. Every AI agent framework that "feels alive" uses some version of:
|
||||
- Heartbeats
|
||||
- Cron jobs
|
||||
- Webhooks
|
||||
- Event loops
|
||||
- Persistent state
|
||||
|
||||
### Understanding vs Hype
|
||||
Understanding this architecture means you can:
|
||||
- Evaluate agent tools intelligently
|
||||
- Build your own implementations
|
||||
- Avoid getting caught up in viral hype
|
||||
- Recognize the pattern in new frameworks
|
||||
|
||||
## The Bottom Line
|
||||
|
||||
OpenClaw isn't magic. It's not sentient. It doesn't think or reason.
|
||||
|
||||
**It's inputs, queues, and a loop.**
|
||||
|
||||
The "alive" feeling comes from well-designed event-driven architecture that makes a reactive system appear proactive. Time becomes an input. External systems become inputs. Internal state becomes inputs. All processed through the same queue with persistent memory.
|
||||
|
||||
Elegant engineering, not artificial consciousness.
|
||||
|
||||
## Further Resources
|
||||
- OpenClaw documentation
|
||||
- Clairvo's original thread (inspiration for this breakdown)
|
||||
- Cisco security research on OpenClaw ecosystem
|
||||
Reference in New Issue
Block a user