latest updates

This commit is contained in:
Tanmay Karande
2026-02-15 15:02:58 -05:00
parent 438bb80416
commit 41b2f9a593
24 changed files with 3883 additions and 388 deletions

291
docs/openclaw.md Normal file
View File

@@ -0,0 +1,291 @@
# 🦞 OpenClaw — Architecture & How It Works
> **Full-Featured Personal AI Assistant** — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.
## Overview
OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.
| Attribute | Value |
|-----------|-------|
| **Language** | TypeScript (Node.js ≥22) |
| **Codebase Size** | 430k+ lines, 50+ source modules |
| **Config** | `~/.openclaw/openclaw.json` (JSON5) |
| **AI Runtime** | Pi Agent (custom RPC), multi-model |
| **Channels** | 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.) |
| **Package Mgr** | pnpm (monorepo) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Messaging Channels (15+)"]
WA["WhatsApp\n(Baileys)"]
TG["Telegram\n(grammY)"]
SL["Slack\n(Bolt)"]
DC["Discord\n(discord.js)"]
GC["Google Chat"]
SIG["Signal\n(signal-cli)"]
BB["BlueBubbles\n(iMessage)"]
IM["iMessage\n(legacy)"]
MST["MS Teams"]
MTX["Matrix"]
ZL["Zalo"]
WC["WebChat"]
end
subgraph Gateway["🌐 Gateway (Control Plane)"]
WS["WebSocket Server\nws://127.0.0.1:18789"]
SES["Session Manager"]
RTE["Channel Router"]
PRES["Presence System"]
Q["Message Queue"]
CFG["Config Manager"]
AUTH["Auth / Pairing"]
end
subgraph Agent["🧠 Pi Agent (RPC)"]
AGENT["Agent Runtime"]
TOOLS["Tool Registry"]
STREAM["Block Streaming"]
PROV["Provider Router\n(multi-model)"]
end
subgraph Apps["📲 Companion Apps"]
MAC["macOS Menu Bar"]
IOS["iOS Node"]
ANDR["Android Node"]
end
subgraph ToolSet["🔧 Tools & Automation"]
BROWSER["Browser Control\n(CDP/Chromium)"]
CANVAS["Live Canvas\n(A2UI)"]
CRON["Cron Jobs"]
WEBHOOK["Webhooks"]
GMAIL["Gmail Pub/Sub"]
NODES["Nodes\n(camera, screen, location)"]
SKILLS_T["Skills Platform"]
SESS_T["Session Tools\n(agent-to-agent)"]
end
subgraph Workspace["💾 Workspace"]
AGENTS_MD["AGENTS.md"]
SOUL_MD["SOUL.md"]
USER_MD["USER.md"]
TOOLS_MD["TOOLS.md"]
SKILLS_W["Skills/"]
end
Channels --> Gateway
Apps --> Gateway
Gateway --> Agent
Agent --> ToolSet
Agent --> Workspace
Agent --> PROV
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Channel (WA/TG/Slack/etc.)
participant GW as Gateway (WS)
participant Session as Session Manager
participant Agent as Pi Agent (RPC)
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward via channel adapter
GW->>Session: Route to session (main/group)
GW->>GW: Check auth (pairing/allowlist)
Session->>Agent: Invoke agent (RPC)
Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
Agent->>LLM: Stream request (with tool definitions)
loop Tool Use Loop
LLM-->>Agent: Tool call (block stream)
Agent->>Tools: Execute tool
Tools-->>Agent: Tool result
Agent->>LLM: Continue with result
end
LLM-->>Agent: Final response (block stream)
Agent-->>Session: Return response
Session->>GW: Add to outbound queue
GW->>GW: Chunk if needed (per-channel limits)
GW->>Channel: Send chunked replies
Channel->>User: Display response
Note over GW: Typing indicators, presence updates
```
---
## Key Components
### 1. Gateway (`src/gateway/`)
The central control plane — everything connects through it:
- **WebSocket server** on `ws://127.0.0.1:18789`
- Session management (main, group, per-channel)
- Multi-agent routing (different agents for different channels)
- Presence tracking and typing indicators
- Config management and hot-reload
- Health checks, doctor diagnostics
### 2. Pi Agent (`src/agents/`)
Custom RPC-based agent runtime:
- Tool streaming and block streaming
- Multi-model support with failover
- Session pruning for long conversations
- Usage tracking (tokens, cost)
- Thinking level control (off → xhigh)
### 3. Channel System (`src/channels/` + per-channel dirs)
15+ channel adapters, each with:
- Auth handling (pairing codes, allowlists, OAuth)
- Message format conversion
- Media pipeline (images, audio, video)
- Group routing with mention gating
- Per-channel chunking (character limits differ)
### 4. Security System (`src/security/`)
Multi-layered security:
- **DM Pairing** — unknown senders get a pairing code, must be approved
- **Allowlists** — per-channel user whitelists
- **Docker Sandbox** — non-main sessions can run in per-session Docker containers
- **Tool denylist** — block dangerous tools in sandbox mode
- **Elevated bash** — per-session toggle for host-level access
### 5. Browser Control (`src/browser/`)
- Dedicated OpenClaw-managed Chrome/Chromium instance
- CDP (Chrome DevTools Protocol) control
- Snapshots, actions, uploads, profiles
- Full web automation capabilities
### 6. Canvas & A2UI (`src/canvas-host/`)
- Agent-driven visual workspace
- A2UI (Agent-to-UI) — push HTML/JS to canvas
- Canvas eval, snapshot, reset
- Available on macOS, iOS, Android
### 7. Voice System
- **Voice Wake** — always-on speech detection
- **Talk Mode** — continuous conversation overlay
- ElevenLabs TTS integration
- Available on macOS, iOS, Android
### 8. Companion Apps
- **macOS app**: Menu bar, Voice Wake/PTT, WebChat, debug tools
- **iOS node**: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
- **Android node**: Canvas, Talk Mode, camera, screen recording, SMS
### 9. Session Tools (Agent-to-Agent)
- `sessions_list` — discover active sessions
- `sessions_history` — fetch transcript logs
- `sessions_send` — message another session with reply-back
### 10. Skills Platform (`src/plugins/`, `skills/`)
- **Bundled skills** — pre-installed capabilities
- **Managed skills** — installed from ClawHub registry
- **Workspace skills** — user-created in workspace
- Install gating and UI
- ClawHub registry for community skills
### 11. Automation
- **Cron jobs** — scheduled recurring tasks
- **Webhooks** — external trigger surface
- **Gmail Pub/Sub** — email-triggered actions
### 12. Ops & Deployment
- Docker support with compose
- Tailscale Serve/Funnel for remote access
- SSH tunnels with token/password auth
- `openclaw doctor` for diagnostics
- Nix mode for declarative config
---
## Project Structure (Simplified)
```
openclaw/
├── src/
│ ├── agents/ # Pi agent runtime
│ ├── gateway/ # WebSocket gateway
│ ├── channels/ # Channel adapter base
│ ├── whatsapp/ # WhatsApp adapter
│ ├── telegram/ # Telegram adapter
│ ├── slack/ # Slack adapter
│ ├── discord/ # Discord adapter
│ ├── signal/ # Signal adapter
│ ├── imessage/ # iMessage adapters
│ ├── browser/ # Browser control (CDP)
│ ├── canvas-host/ # Canvas & A2UI
│ ├── sessions/ # Session management
│ ├── routing/ # Message routing
│ ├── security/ # Auth, pairing, sandbox
│ ├── cron/ # Scheduled jobs
│ ├── memory/ # Memory system
│ ├── providers/ # LLM providers
│ ├── plugins/ # Plugin/skill system
│ ├── media/ # Media pipeline
│ ├── tts/ # Text-to-speech
│ ├── web/ # Control UI + WebChat
│ ├── wizard/ # Onboarding wizard
│ └── cli/ # CLI commands
├── apps/ # Companion app sources
├── packages/ # Shared packages
├── extensions/ # Extension channels
├── skills/ # Bundled skills
├── ui/ # Web UI source
└── Swabble/ # macOS/iOS Swift source
```
---
## CLI Commands
| Command | Description |
|---------|-------------|
| `openclaw onboard` | Guided setup wizard |
| `openclaw gateway` | Start the gateway |
| `openclaw agent --message "..."` | Chat with agent |
| `openclaw message send` | Send to any channel |
| `openclaw doctor` | Diagnostics & migration |
| `openclaw pairing approve` | Approve DM pairing |
| `openclaw update` | Update to latest version |
| `openclaw channels login` | Link WhatsApp |
---
## Chat Commands (In-Channel)
| Command | Description |
|---------|-------------|
| `/status` | Session status (model, tokens, cost) |
| `/new` / `/reset` | Reset session |
| `/compact` | Compact session context |
| `/think <level>` | Set thinking level |
| `/verbose on\|off` | Toggle verbose mode |
| `/usage off\|tokens\|full` | Usage footer |
| `/restart` | Restart gateway |
| `/activation mention\|always` | Group activation mode |
---
## Key Design Decisions
1. **Gateway as control plane** — Single WebSocket server everything connects to
2. **Multi-agent routing** — Different agents for different channels/groups
3. **Pairing-based security** — Unknown DMs get pairing codes by default
4. **Docker sandboxing** — Non-main sessions can be isolated
5. **Block streaming** — Responses streamed as structured blocks
6. **Extension-based channels** — MS Teams, Matrix, Zalo are extensions
7. **Companion apps** — Native macOS/iOS/Android for device-level features
8. **ClawHub** — Community skill registry