feat: openclaw-style secrets (env.vars + \) and per-task model routing
- Replace python-dotenv with config.json env.vars block + \ substitution - Add models section for per-task model routing (heartbeat, subagent, default) - Heartbeat/subagent tasks can use different models/providers than main chat - Remove python-dotenv from dependencies - Update all docs to reflect new config approach - Reorganize docs into project/ and research/ subdirectories
This commit is contained in:
291
docs/research/openclaw.md
Normal file
291
docs/research/openclaw.md
Normal file
@@ -0,0 +1,291 @@
|
||||
# 🦞 OpenClaw — Architecture & How It Works
|
||||
|
||||
> **Full-Featured Personal AI Assistant** — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.
|
||||
|
||||
## Overview
|
||||
|
||||
OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | TypeScript (Node.js ≥22) |
|
||||
| **Codebase Size** | 430k+ lines, 50+ source modules |
|
||||
| **Config** | `~/.openclaw/openclaw.json` (JSON5) |
|
||||
| **AI Runtime** | Pi Agent (custom RPC), multi-model |
|
||||
| **Channels** | 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.) |
|
||||
| **Package Mgr** | pnpm (monorepo) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Messaging Channels (15+)"]
|
||||
WA["WhatsApp\n(Baileys)"]
|
||||
TG["Telegram\n(grammY)"]
|
||||
SL["Slack\n(Bolt)"]
|
||||
DC["Discord\n(discord.js)"]
|
||||
GC["Google Chat"]
|
||||
SIG["Signal\n(signal-cli)"]
|
||||
BB["BlueBubbles\n(iMessage)"]
|
||||
IM["iMessage\n(legacy)"]
|
||||
MST["MS Teams"]
|
||||
MTX["Matrix"]
|
||||
ZL["Zalo"]
|
||||
WC["WebChat"]
|
||||
end
|
||||
|
||||
subgraph Gateway["🌐 Gateway (Control Plane)"]
|
||||
WS["WebSocket Server\nws://127.0.0.1:18789"]
|
||||
SES["Session Manager"]
|
||||
RTE["Channel Router"]
|
||||
PRES["Presence System"]
|
||||
Q["Message Queue"]
|
||||
CFG["Config Manager"]
|
||||
AUTH["Auth / Pairing"]
|
||||
end
|
||||
|
||||
subgraph Agent["🧠 Pi Agent (RPC)"]
|
||||
AGENT["Agent Runtime"]
|
||||
TOOLS["Tool Registry"]
|
||||
STREAM["Block Streaming"]
|
||||
PROV["Provider Router\n(multi-model)"]
|
||||
end
|
||||
|
||||
subgraph Apps["📲 Companion Apps"]
|
||||
MAC["macOS Menu Bar"]
|
||||
IOS["iOS Node"]
|
||||
ANDR["Android Node"]
|
||||
end
|
||||
|
||||
subgraph ToolSet["🔧 Tools & Automation"]
|
||||
BROWSER["Browser Control\n(CDP/Chromium)"]
|
||||
CANVAS["Live Canvas\n(A2UI)"]
|
||||
CRON["Cron Jobs"]
|
||||
WEBHOOK["Webhooks"]
|
||||
GMAIL["Gmail Pub/Sub"]
|
||||
NODES["Nodes\n(camera, screen, location)"]
|
||||
SKILLS_T["Skills Platform"]
|
||||
SESS_T["Session Tools\n(agent-to-agent)"]
|
||||
end
|
||||
|
||||
subgraph Workspace["💾 Workspace"]
|
||||
AGENTS_MD["AGENTS.md"]
|
||||
SOUL_MD["SOUL.md"]
|
||||
USER_MD["USER.md"]
|
||||
TOOLS_MD["TOOLS.md"]
|
||||
SKILLS_W["Skills/"]
|
||||
end
|
||||
|
||||
Channels --> Gateway
|
||||
Apps --> Gateway
|
||||
Gateway --> Agent
|
||||
Agent --> ToolSet
|
||||
Agent --> Workspace
|
||||
Agent --> PROV
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Channel (WA/TG/Slack/etc.)
|
||||
participant GW as Gateway (WS)
|
||||
participant Session as Session Manager
|
||||
participant Agent as Pi Agent (RPC)
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward via channel adapter
|
||||
GW->>Session: Route to session (main/group)
|
||||
GW->>GW: Check auth (pairing/allowlist)
|
||||
Session->>Agent: Invoke agent (RPC)
|
||||
Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
|
||||
Agent->>LLM: Stream request (with tool definitions)
|
||||
|
||||
loop Tool Use Loop
|
||||
LLM-->>Agent: Tool call (block stream)
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Tool result
|
||||
Agent->>LLM: Continue with result
|
||||
end
|
||||
|
||||
LLM-->>Agent: Final response (block stream)
|
||||
Agent-->>Session: Return response
|
||||
Session->>GW: Add to outbound queue
|
||||
GW->>GW: Chunk if needed (per-channel limits)
|
||||
GW->>Channel: Send chunked replies
|
||||
Channel->>User: Display response
|
||||
|
||||
Note over GW: Typing indicators, presence updates
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Gateway (`src/gateway/`)
|
||||
The central control plane — everything connects through it:
|
||||
- **WebSocket server** on `ws://127.0.0.1:18789`
|
||||
- Session management (main, group, per-channel)
|
||||
- Multi-agent routing (different agents for different channels)
|
||||
- Presence tracking and typing indicators
|
||||
- Config management and hot-reload
|
||||
- Health checks, doctor diagnostics
|
||||
|
||||
### 2. Pi Agent (`src/agents/`)
|
||||
Custom RPC-based agent runtime:
|
||||
- Tool streaming and block streaming
|
||||
- Multi-model support with failover
|
||||
- Session pruning for long conversations
|
||||
- Usage tracking (tokens, cost)
|
||||
- Thinking level control (off → xhigh)
|
||||
|
||||
### 3. Channel System (`src/channels/` + per-channel dirs)
|
||||
15+ channel adapters, each with:
|
||||
- Auth handling (pairing codes, allowlists, OAuth)
|
||||
- Message format conversion
|
||||
- Media pipeline (images, audio, video)
|
||||
- Group routing with mention gating
|
||||
- Per-channel chunking (character limits differ)
|
||||
|
||||
### 4. Security System (`src/security/`)
|
||||
Multi-layered security:
|
||||
- **DM Pairing** — unknown senders get a pairing code, must be approved
|
||||
- **Allowlists** — per-channel user whitelists
|
||||
- **Docker Sandbox** — non-main sessions can run in per-session Docker containers
|
||||
- **Tool denylist** — block dangerous tools in sandbox mode
|
||||
- **Elevated bash** — per-session toggle for host-level access
|
||||
|
||||
### 5. Browser Control (`src/browser/`)
|
||||
- Dedicated OpenClaw-managed Chrome/Chromium instance
|
||||
- CDP (Chrome DevTools Protocol) control
|
||||
- Snapshots, actions, uploads, profiles
|
||||
- Full web automation capabilities
|
||||
|
||||
### 6. Canvas & A2UI (`src/canvas-host/`)
|
||||
- Agent-driven visual workspace
|
||||
- A2UI (Agent-to-UI) — push HTML/JS to canvas
|
||||
- Canvas eval, snapshot, reset
|
||||
- Available on macOS, iOS, Android
|
||||
|
||||
### 7. Voice System
|
||||
- **Voice Wake** — always-on speech detection
|
||||
- **Talk Mode** — continuous conversation overlay
|
||||
- ElevenLabs TTS integration
|
||||
- Available on macOS, iOS, Android
|
||||
|
||||
### 8. Companion Apps
|
||||
- **macOS app**: Menu bar, Voice Wake/PTT, WebChat, debug tools
|
||||
- **iOS node**: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
|
||||
- **Android node**: Canvas, Talk Mode, camera, screen recording, SMS
|
||||
|
||||
### 9. Session Tools (Agent-to-Agent)
|
||||
- `sessions_list` — discover active sessions
|
||||
- `sessions_history` — fetch transcript logs
|
||||
- `sessions_send` — message another session with reply-back
|
||||
|
||||
### 10. Skills Platform (`src/plugins/`, `skills/`)
|
||||
- **Bundled skills** — pre-installed capabilities
|
||||
- **Managed skills** — installed from ClawHub registry
|
||||
- **Workspace skills** — user-created in workspace
|
||||
- Install gating and UI
|
||||
- ClawHub registry for community skills
|
||||
|
||||
### 11. Automation
|
||||
- **Cron jobs** — scheduled recurring tasks
|
||||
- **Webhooks** — external trigger surface
|
||||
- **Gmail Pub/Sub** — email-triggered actions
|
||||
|
||||
### 12. Ops & Deployment
|
||||
- Docker support with compose
|
||||
- Tailscale Serve/Funnel for remote access
|
||||
- SSH tunnels with token/password auth
|
||||
- `openclaw doctor` for diagnostics
|
||||
- Nix mode for declarative config
|
||||
|
||||
---
|
||||
|
||||
## Project Structure (Simplified)
|
||||
|
||||
```
|
||||
openclaw/
|
||||
├── src/
|
||||
│ ├── agents/ # Pi agent runtime
|
||||
│ ├── gateway/ # WebSocket gateway
|
||||
│ ├── channels/ # Channel adapter base
|
||||
│ ├── whatsapp/ # WhatsApp adapter
|
||||
│ ├── telegram/ # Telegram adapter
|
||||
│ ├── slack/ # Slack adapter
|
||||
│ ├── discord/ # Discord adapter
|
||||
│ ├── signal/ # Signal adapter
|
||||
│ ├── imessage/ # iMessage adapters
|
||||
│ ├── browser/ # Browser control (CDP)
|
||||
│ ├── canvas-host/ # Canvas & A2UI
|
||||
│ ├── sessions/ # Session management
|
||||
│ ├── routing/ # Message routing
|
||||
│ ├── security/ # Auth, pairing, sandbox
|
||||
│ ├── cron/ # Scheduled jobs
|
||||
│ ├── memory/ # Memory system
|
||||
│ ├── providers/ # LLM providers
|
||||
│ ├── plugins/ # Plugin/skill system
|
||||
│ ├── media/ # Media pipeline
|
||||
│ ├── tts/ # Text-to-speech
|
||||
│ ├── web/ # Control UI + WebChat
|
||||
│ ├── wizard/ # Onboarding wizard
|
||||
│ └── cli/ # CLI commands
|
||||
├── apps/ # Companion app sources
|
||||
├── packages/ # Shared packages
|
||||
├── extensions/ # Extension channels
|
||||
├── skills/ # Bundled skills
|
||||
├── ui/ # Web UI source
|
||||
└── Swabble/ # macOS/iOS Swift source
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `openclaw onboard` | Guided setup wizard |
|
||||
| `openclaw gateway` | Start the gateway |
|
||||
| `openclaw agent --message "..."` | Chat with agent |
|
||||
| `openclaw message send` | Send to any channel |
|
||||
| `openclaw doctor` | Diagnostics & migration |
|
||||
| `openclaw pairing approve` | Approve DM pairing |
|
||||
| `openclaw update` | Update to latest version |
|
||||
| `openclaw channels login` | Link WhatsApp |
|
||||
|
||||
---
|
||||
|
||||
## Chat Commands (In-Channel)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/status` | Session status (model, tokens, cost) |
|
||||
| `/new` / `/reset` | Reset session |
|
||||
| `/compact` | Compact session context |
|
||||
| `/think <level>` | Set thinking level |
|
||||
| `/verbose on\|off` | Toggle verbose mode |
|
||||
| `/usage off\|tokens\|full` | Usage footer |
|
||||
| `/restart` | Restart gateway |
|
||||
| `/activation mention\|always` | Group activation mode |
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Gateway as control plane** — Single WebSocket server everything connects to
|
||||
2. **Multi-agent routing** — Different agents for different channels/groups
|
||||
3. **Pairing-based security** — Unknown DMs get pairing codes by default
|
||||
4. **Docker sandboxing** — Non-main sessions can be isolated
|
||||
5. **Block streaming** — Responses streamed as structured blocks
|
||||
6. **Extension-based channels** — MS Teams, Matrix, Zalo are extensions
|
||||
7. **Companion apps** — Native macOS/iOS/Android for device-level features
|
||||
8. **ClawHub** — Community skill registry
|
||||
Reference in New Issue
Block a user