Files
Aetheel/docs/research/openclaw.md
tanmay11k 82c2640481 feat: openclaw-style secrets (env.vars + \) and per-task model routing
- Replace python-dotenv with config.json env.vars block + \ substitution
- Add models section for per-task model routing (heartbeat, subagent, default)
- Heartbeat/subagent tasks can use different models/providers than main chat
- Remove python-dotenv from dependencies
- Update all docs to reflect new config approach
- Reorganize docs into project/ and research/ subdirectories
2026-02-20 23:49:05 -05:00

9.7 KiB

🦞 OpenClaw — Architecture & How It Works

Full-Featured Personal AI Assistant — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.

Overview

OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.

Attribute Value
Language TypeScript (Node.js ≥22)
Codebase Size 430k+ lines, 50+ source modules
Config ~/.openclaw/openclaw.json (JSON5)
AI Runtime Pi Agent (custom RPC), multi-model
Channels 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.)
Package Mgr pnpm (monorepo)

Architecture Flowchart

graph TB
    subgraph Channels["📱 Messaging Channels (15+)"]
        WA["WhatsApp\n(Baileys)"]
        TG["Telegram\n(grammY)"]
        SL["Slack\n(Bolt)"]
        DC["Discord\n(discord.js)"]
        GC["Google Chat"]
        SIG["Signal\n(signal-cli)"]
        BB["BlueBubbles\n(iMessage)"]
        IM["iMessage\n(legacy)"]
        MST["MS Teams"]
        MTX["Matrix"]
        ZL["Zalo"]
        WC["WebChat"]
    end

    subgraph Gateway["🌐 Gateway (Control Plane)"]
        WS["WebSocket Server\nws://127.0.0.1:18789"]
        SES["Session Manager"]
        RTE["Channel Router"]
        PRES["Presence System"]
        Q["Message Queue"]
        CFG["Config Manager"]
        AUTH["Auth / Pairing"]
    end

    subgraph Agent["🧠 Pi Agent (RPC)"]
        AGENT["Agent Runtime"]
        TOOLS["Tool Registry"]
        STREAM["Block Streaming"]
        PROV["Provider Router\n(multi-model)"]
    end

    subgraph Apps["📲 Companion Apps"]
        MAC["macOS Menu Bar"]
        IOS["iOS Node"]
        ANDR["Android Node"]
    end

    subgraph ToolSet["🔧 Tools & Automation"]
        BROWSER["Browser Control\n(CDP/Chromium)"]
        CANVAS["Live Canvas\n(A2UI)"]
        CRON["Cron Jobs"]
        WEBHOOK["Webhooks"]
        GMAIL["Gmail Pub/Sub"]
        NODES["Nodes\n(camera, screen, location)"]
        SKILLS_T["Skills Platform"]
        SESS_T["Session Tools\n(agent-to-agent)"]
    end

    subgraph Workspace["💾 Workspace"]
        AGENTS_MD["AGENTS.md"]
        SOUL_MD["SOUL.md"]
        USER_MD["USER.md"]
        TOOLS_MD["TOOLS.md"]
        SKILLS_W["Skills/"]
    end

    Channels --> Gateway
    Apps --> Gateway
    Gateway --> Agent
    Agent --> ToolSet
    Agent --> Workspace
    Agent --> PROV

Message Flow

sequenceDiagram
    participant User
    participant Channel as Channel (WA/TG/Slack/etc.)
    participant GW as Gateway (WS)
    participant Session as Session Manager
    participant Agent as Pi Agent (RPC)
    participant LLM as LLM Provider
    participant Tools as Tools

    User->>Channel: Send message
    Channel->>GW: Forward via channel adapter
    GW->>Session: Route to session (main/group)
    GW->>GW: Check auth (pairing/allowlist)
    Session->>Agent: Invoke agent (RPC)
    Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
    Agent->>LLM: Stream request (with tool definitions)
    
    loop Tool Use Loop
        LLM-->>Agent: Tool call (block stream)
        Agent->>Tools: Execute tool
        Tools-->>Agent: Tool result
        Agent->>LLM: Continue with result
    end
    
    LLM-->>Agent: Final response (block stream)
    Agent-->>Session: Return response
    Session->>GW: Add to outbound queue
    GW->>GW: Chunk if needed (per-channel limits)
    GW->>Channel: Send chunked replies
    Channel->>User: Display response

    Note over GW: Typing indicators, presence updates

Key Components

1. Gateway (src/gateway/)

The central control plane — everything connects through it:

  • WebSocket server on ws://127.0.0.1:18789
  • Session management (main, group, per-channel)
  • Multi-agent routing (different agents for different channels)
  • Presence tracking and typing indicators
  • Config management and hot-reload
  • Health checks, doctor diagnostics

2. Pi Agent (src/agents/)

Custom RPC-based agent runtime:

  • Tool streaming and block streaming
  • Multi-model support with failover
  • Session pruning for long conversations
  • Usage tracking (tokens, cost)
  • Thinking level control (off → xhigh)

3. Channel System (src/channels/ + per-channel dirs)

15+ channel adapters, each with:

  • Auth handling (pairing codes, allowlists, OAuth)
  • Message format conversion
  • Media pipeline (images, audio, video)
  • Group routing with mention gating
  • Per-channel chunking (character limits differ)

4. Security System (src/security/)

Multi-layered security:

  • DM Pairing — unknown senders get a pairing code, must be approved
  • Allowlists — per-channel user whitelists
  • Docker Sandbox — non-main sessions can run in per-session Docker containers
  • Tool denylist — block dangerous tools in sandbox mode
  • Elevated bash — per-session toggle for host-level access

5. Browser Control (src/browser/)

  • Dedicated OpenClaw-managed Chrome/Chromium instance
  • CDP (Chrome DevTools Protocol) control
  • Snapshots, actions, uploads, profiles
  • Full web automation capabilities

6. Canvas & A2UI (src/canvas-host/)

  • Agent-driven visual workspace
  • A2UI (Agent-to-UI) — push HTML/JS to canvas
  • Canvas eval, snapshot, reset
  • Available on macOS, iOS, Android

7. Voice System

  • Voice Wake — always-on speech detection
  • Talk Mode — continuous conversation overlay
  • ElevenLabs TTS integration
  • Available on macOS, iOS, Android

8. Companion Apps

  • macOS app: Menu bar, Voice Wake/PTT, WebChat, debug tools
  • iOS node: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
  • Android node: Canvas, Talk Mode, camera, screen recording, SMS

9. Session Tools (Agent-to-Agent)

  • sessions_list — discover active sessions
  • sessions_history — fetch transcript logs
  • sessions_send — message another session with reply-back

10. Skills Platform (src/plugins/, skills/)

  • Bundled skills — pre-installed capabilities
  • Managed skills — installed from ClawHub registry
  • Workspace skills — user-created in workspace
  • Install gating and UI
  • ClawHub registry for community skills

11. Automation

  • Cron jobs — scheduled recurring tasks
  • Webhooks — external trigger surface
  • Gmail Pub/Sub — email-triggered actions

12. Ops & Deployment

  • Docker support with compose
  • Tailscale Serve/Funnel for remote access
  • SSH tunnels with token/password auth
  • openclaw doctor for diagnostics
  • Nix mode for declarative config

Project Structure (Simplified)

openclaw/
├── src/
│   ├── agents/          # Pi agent runtime
│   ├── gateway/         # WebSocket gateway
│   ├── channels/        # Channel adapter base
│   ├── whatsapp/        # WhatsApp adapter
│   ├── telegram/        # Telegram adapter
│   ├── slack/           # Slack adapter
│   ├── discord/         # Discord adapter
│   ├── signal/          # Signal adapter
│   ├── imessage/        # iMessage adapters
│   ├── browser/         # Browser control (CDP)
│   ├── canvas-host/     # Canvas & A2UI
│   ├── sessions/        # Session management
│   ├── routing/         # Message routing
│   ├── security/        # Auth, pairing, sandbox
│   ├── cron/            # Scheduled jobs
│   ├── memory/          # Memory system
│   ├── providers/       # LLM providers
│   ├── plugins/         # Plugin/skill system
│   ├── media/           # Media pipeline
│   ├── tts/             # Text-to-speech
│   ├── web/             # Control UI + WebChat
│   ├── wizard/          # Onboarding wizard
│   └── cli/             # CLI commands
├── apps/                # Companion app sources
├── packages/            # Shared packages
├── extensions/          # Extension channels
├── skills/              # Bundled skills
├── ui/                  # Web UI source
└── Swabble/             # macOS/iOS Swift source

CLI Commands

Command Description
openclaw onboard Guided setup wizard
openclaw gateway Start the gateway
openclaw agent --message "..." Chat with agent
openclaw message send Send to any channel
openclaw doctor Diagnostics & migration
openclaw pairing approve Approve DM pairing
openclaw update Update to latest version
openclaw channels login Link WhatsApp

Chat Commands (In-Channel)

Command Description
/status Session status (model, tokens, cost)
/new / /reset Reset session
/compact Compact session context
/think <level> Set thinking level
/verbose on|off Toggle verbose mode
/usage off|tokens|full Usage footer
/restart Restart gateway
/activation mention|always Group activation mode

Key Design Decisions

  1. Gateway as control plane — Single WebSocket server everything connects to
  2. Multi-agent routing — Different agents for different channels/groups
  3. Pairing-based security — Unknown DMs get pairing codes by default
  4. Docker sandboxing — Non-main sessions can be isolated
  5. Block streaming — Responses streamed as structured blocks
  6. Extension-based channels — MS Teams, Matrix, Zalo are extensions
  7. Companion apps — Native macOS/iOS/Android for device-level features
  8. ClawHub — Community skill registry