- Replace python-dotenv with config.json env.vars block + \ substitution - Add models section for per-task model routing (heartbeat, subagent, default) - Heartbeat/subagent tasks can use different models/providers than main chat - Remove python-dotenv from dependencies - Update all docs to reflect new config approach - Reorganize docs into project/ and research/ subdirectories
238 lines
7.7 KiB
Markdown
238 lines
7.7 KiB
Markdown
|
|
# OpenClaw Architecture Deep Dive
|
|
|
|
## What is OpenClaw?
|
|
|
|
OpenClaw is an open source AI assistant created by Peter Steinberger (founder of PSP PDF kit) that gained 100,000 GitHub stars in 3 days - one of the fastest growing repositories in GitHub history.
|
|
|
|
**Technical Definition:** An agent runtime with a gateway in front of it.
|
|
|
|
Despite viral stories of agents calling owners at 3am, texting people's wives autonomously, and browsing Twitter overnight, OpenClaw isn't sentient. It's elegant event-driven engineering.
|
|
|
|
## Core Architecture
|
|
|
|
### The Gateway
|
|
- Long-running process on your machine
|
|
- Constantly accepts connections from messaging apps (WhatsApp, Telegram, Discord, iMessage, Slack)
|
|
- Routes messages to AI agents
|
|
- **Doesn't think, reason, or decide** - only accepts inputs and routes them
|
|
|
|
### The Agent Runtime
|
|
- Processes events from the queue
|
|
- Executes actions using available tools
|
|
- Has deep system access: shell commands, file operations, browser control
|
|
|
|
### State Persistence
|
|
- Memory stored as local markdown files
|
|
- Includes preferences, conversation history, context from previous sessions
|
|
- Agent "remembers" by reading these files on each wake-up
|
|
- Not real-time learning - just file reading
|
|
|
|
### The Event Loop
|
|
All events enter a queue → Queue gets processed → Agents execute → State persists → Loop continues
|
|
|
|
## The Five Input Types
|
|
|
|
### 1. Messages (Human Input)
|
|
**How it works:**
|
|
- You send text via WhatsApp, iMessage, or Slack
|
|
- Gateway receives and routes to agent
|
|
- Agent responds
|
|
|
|
**Key details:**
|
|
- Sessions are per-channel (WhatsApp and Slack are separate contexts)
|
|
- Multiple requests queue up and process in order
|
|
- No jumbled responses - finishes one thought before moving to next
|
|
|
|
### 2. Heartbeats (Timer Events)
|
|
**How it works:**
|
|
- Timer fires at regular intervals (default: every 30 minutes)
|
|
- Gateway schedules an agent turn with a preconfigured prompt
|
|
- Agent responds to instructions like "Check inbox for urgent items" or "Review calendar"
|
|
|
|
**Key details:**
|
|
- Configurable interval, prompt, and active hours
|
|
- If nothing urgent: agent returns `heartbeat_okay` token (suppressed from user)
|
|
- If something urgent: you get a ping
|
|
- **This is the secret sauce** - makes OpenClaw feel proactive
|
|
|
|
**Example prompts:**
|
|
- "Check my inbox for anything urgent"
|
|
- "Review my calendar"
|
|
- "Look for overdue tasks"
|
|
|
|
### 3. Cron Jobs (Scheduled Events)
|
|
**How it works:**
|
|
- More control than heartbeats
|
|
- Specify exact timing and custom instructions
|
|
- When time hits, event fires and prompt sent to agent
|
|
|
|
**Examples:**
|
|
- 9am daily: "Check email and flag anything urgent"
|
|
- Every Monday 3pm: "Review calendar for the week and remind me of conflicts"
|
|
- Midnight: "Browse my Twitter feed and save interesting posts"
|
|
- 8am: "Text wife good morning"
|
|
- 10pm: "Text wife good night"
|
|
|
|
**Real example:** The viral story of agent texting someone's wife was just cron jobs firing at scheduled times. Agent wasn't deciding - it was responding to scheduled prompts.
|
|
|
|
### 4. Hooks (Internal State Changes)
|
|
**How it works:**
|
|
- System itself triggers these events
|
|
- Event-driven development pattern
|
|
|
|
**Types:**
|
|
- Gateway startup → fires hook
|
|
- Agent begins task → fires hook
|
|
- Stop command issued → fires hook
|
|
|
|
**Purpose:**
|
|
- Save memory on reset
|
|
- Run setup instructions on startup
|
|
- Modify context before agent runs
|
|
- Self-management
|
|
|
|
### 5. Webhooks (External System Events)
|
|
**How it works:**
|
|
- External systems notify OpenClaw of events
|
|
- Agent responds to entire digital life
|
|
|
|
**Examples:**
|
|
- Email hits inbox → webhook fires → agent processes
|
|
- Slack reaction → webhook fires → agent responds
|
|
- Jira ticket created → webhook fires → agent researches
|
|
- GitHub event → webhook fires → agent acts
|
|
- Calendar event approaches → webhook fires → agent reminds
|
|
|
|
**Supported integrations:** Slack, Discord, GitHub, and basically anything with webhook support
|
|
|
|
### Bonus: Agent-to-Agent Messaging
|
|
**How it works:**
|
|
- Multi-agent setups with isolated workspaces
|
|
- Agents pass messages between each other
|
|
- Each agent has different profile/specialization
|
|
|
|
**Example:**
|
|
- Research Agent finishes gathering info
|
|
- Queues up work for Writing Agent
|
|
- Writing Agent processes and produces output
|
|
|
|
**Reality:** Looks like collaboration, but it's just messages entering queues
|
|
|
|
## Why It Feels Alive
|
|
|
|
The combination creates an illusion of autonomy:
|
|
|
|
**Time** (heartbeats, crons) → **Events** → **Queue** → **Agent Execution** → **State Persistence** → **Loop**
|
|
|
|
### The 3am Phone Call Example
|
|
|
|
**What it looked like:**
|
|
- Agent autonomously decided to get phone number
|
|
- Agent decided to call owner
|
|
- Agent waited until 3am to execute
|
|
|
|
**What actually happened:**
|
|
1. Some event fired (cron or heartbeat) - exact configuration unknown
|
|
2. Event entered queue
|
|
3. Agent processed with available tools and instructions
|
|
4. Agent acquired Twilio phone number
|
|
5. Agent made the call
|
|
6. Owner didn't ask in the moment, but behavior was enabled in setup
|
|
|
|
**Key insight:** Nothing was thinking overnight. Nothing was deciding. Time produced event → Event kicked off agent → Agent followed instructions.
|
|
|
|
## The Complete Event Flow
|
|
|
|
**Event Sources:**
|
|
- Time creates events (heartbeats, crons)
|
|
- Humans create events (messages)
|
|
- External systems create events (webhooks)
|
|
- Internal state creates events (hooks)
|
|
- Agents create events for other agents
|
|
|
|
**Processing:**
|
|
All events → Enter queue → Queue processed → Agents execute → State persists → Loop continues
|
|
|
|
**Memory:**
|
|
- Stored in local markdown files
|
|
- Agent reads on wake-up
|
|
- Remembers previous conversations
|
|
- Not learning - just reading files you could open in text editor
|
|
|
|
## Security Concerns
|
|
|
|
### The Analysis
|
|
Cisco's security team analyzed OpenClaw ecosystem:
|
|
- 31,000 available skills examined
|
|
- 26% contain at least one vulnerability
|
|
- Called it "a security nightmare"
|
|
|
|
### Why It's Risky
|
|
OpenClaw has deep system access:
|
|
- Run shell commands
|
|
- Read and write files
|
|
- Execute scripts
|
|
- Control browser
|
|
|
|
### Specific Risks
|
|
1. **Prompt injection** through emails or documents
|
|
2. **Malicious skills** in marketplace
|
|
3. **Credential exposure**
|
|
4. **Command misinterpretation** that deletes unintended files
|
|
|
|
### OpenClaw's Own Warning
|
|
Documentation states: "There's no perfectly secure setup"
|
|
|
|
### Mitigation Strategies
|
|
- Run on secondary machine
|
|
- Use isolated accounts
|
|
- Limit enabled skills
|
|
- Monitor logs actively
|
|
- Use Railway's one-click deployment (runs in isolated container)
|
|
|
|
## Key Architectural Takeaways
|
|
|
|
### The Four Components
|
|
1. **Time** that produces events
|
|
2. **Events** that trigger agents
|
|
3. **State** that persists across interactions
|
|
4. **Loop** that keeps processing
|
|
|
|
### Building Your Own
|
|
You don't need OpenClaw specifically. You need:
|
|
- Event scheduling mechanism
|
|
- Queue system
|
|
- LLM for processing
|
|
- State persistence layer
|
|
|
|
### The Pattern
|
|
This architecture will appear everywhere. Every AI agent framework that "feels alive" uses some version of:
|
|
- Heartbeats
|
|
- Cron jobs
|
|
- Webhooks
|
|
- Event loops
|
|
- Persistent state
|
|
|
|
### Understanding vs Hype
|
|
Understanding this architecture means you can:
|
|
- Evaluate agent tools intelligently
|
|
- Build your own implementations
|
|
- Avoid getting caught up in viral hype
|
|
- Recognize the pattern in new frameworks
|
|
|
|
## The Bottom Line
|
|
|
|
OpenClaw isn't magic. It's not sentient. It doesn't think or reason.
|
|
|
|
**It's inputs, queues, and a loop.**
|
|
|
|
The "alive" feeling comes from well-designed event-driven architecture that makes a reactive system appear proactive. Time becomes an input. External systems become inputs. Internal state becomes inputs. All processed through the same queue with persistent memory.
|
|
|
|
Elegant engineering, not artificial consciousness.
|
|
|
|
## Further Resources
|
|
- OpenClaw documentation
|
|
- Clairvo's original thread (inspiration for this breakdown)
|
|
- Cisco security research on OpenClaw ecosystem
|