feat: openclaw-style secrets (env.vars + \) and per-task model routing

- Replace python-dotenv with config.json env.vars block + \ substitution
- Add models section for per-task model routing (heartbeat, subagent, default)
- Heartbeat/subagent tasks can use different models/providers than main chat
- Remove python-dotenv from dependencies
- Update all docs to reflect new config approach
- Reorganize docs into project/ and research/ subdirectories
This commit is contained in:
2026-02-20 23:49:05 -05:00
parent 55c6767e69
commit 82c2640481
35 changed files with 2904 additions and 422 deletions

View File

@@ -0,0 +1,237 @@
# OpenClaw Architecture Deep Dive
## What is OpenClaw?
OpenClaw is an open source AI assistant created by Peter Steinberger (founder of PSP PDF kit) that gained 100,000 GitHub stars in 3 days - one of the fastest growing repositories in GitHub history.
**Technical Definition:** An agent runtime with a gateway in front of it.
Despite viral stories of agents calling owners at 3am, texting people's wives autonomously, and browsing Twitter overnight, OpenClaw isn't sentient. It's elegant event-driven engineering.
## Core Architecture
### The Gateway
- Long-running process on your machine
- Constantly accepts connections from messaging apps (WhatsApp, Telegram, Discord, iMessage, Slack)
- Routes messages to AI agents
- **Doesn't think, reason, or decide** - only accepts inputs and routes them
### The Agent Runtime
- Processes events from the queue
- Executes actions using available tools
- Has deep system access: shell commands, file operations, browser control
### State Persistence
- Memory stored as local markdown files
- Includes preferences, conversation history, context from previous sessions
- Agent "remembers" by reading these files on each wake-up
- Not real-time learning - just file reading
### The Event Loop
All events enter a queue → Queue gets processed → Agents execute → State persists → Loop continues
## The Five Input Types
### 1. Messages (Human Input)
**How it works:**
- You send text via WhatsApp, iMessage, or Slack
- Gateway receives and routes to agent
- Agent responds
**Key details:**
- Sessions are per-channel (WhatsApp and Slack are separate contexts)
- Multiple requests queue up and process in order
- No jumbled responses - finishes one thought before moving to next
### 2. Heartbeats (Timer Events)
**How it works:**
- Timer fires at regular intervals (default: every 30 minutes)
- Gateway schedules an agent turn with a preconfigured prompt
- Agent responds to instructions like "Check inbox for urgent items" or "Review calendar"
**Key details:**
- Configurable interval, prompt, and active hours
- If nothing urgent: agent returns `heartbeat_okay` token (suppressed from user)
- If something urgent: you get a ping
- **This is the secret sauce** - makes OpenClaw feel proactive
**Example prompts:**
- "Check my inbox for anything urgent"
- "Review my calendar"
- "Look for overdue tasks"
### 3. Cron Jobs (Scheduled Events)
**How it works:**
- More control than heartbeats
- Specify exact timing and custom instructions
- When time hits, event fires and prompt sent to agent
**Examples:**
- 9am daily: "Check email and flag anything urgent"
- Every Monday 3pm: "Review calendar for the week and remind me of conflicts"
- Midnight: "Browse my Twitter feed and save interesting posts"
- 8am: "Text wife good morning"
- 10pm: "Text wife good night"
**Real example:** The viral story of agent texting someone's wife was just cron jobs firing at scheduled times. Agent wasn't deciding - it was responding to scheduled prompts.
### 4. Hooks (Internal State Changes)
**How it works:**
- System itself triggers these events
- Event-driven development pattern
**Types:**
- Gateway startup → fires hook
- Agent begins task → fires hook
- Stop command issued → fires hook
**Purpose:**
- Save memory on reset
- Run setup instructions on startup
- Modify context before agent runs
- Self-management
### 5. Webhooks (External System Events)
**How it works:**
- External systems notify OpenClaw of events
- Agent responds to entire digital life
**Examples:**
- Email hits inbox → webhook fires → agent processes
- Slack reaction → webhook fires → agent responds
- Jira ticket created → webhook fires → agent researches
- GitHub event → webhook fires → agent acts
- Calendar event approaches → webhook fires → agent reminds
**Supported integrations:** Slack, Discord, GitHub, and basically anything with webhook support
### Bonus: Agent-to-Agent Messaging
**How it works:**
- Multi-agent setups with isolated workspaces
- Agents pass messages between each other
- Each agent has different profile/specialization
**Example:**
- Research Agent finishes gathering info
- Queues up work for Writing Agent
- Writing Agent processes and produces output
**Reality:** Looks like collaboration, but it's just messages entering queues
## Why It Feels Alive
The combination creates an illusion of autonomy:
**Time** (heartbeats, crons) → **Events****Queue****Agent Execution****State Persistence****Loop**
### The 3am Phone Call Example
**What it looked like:**
- Agent autonomously decided to get phone number
- Agent decided to call owner
- Agent waited until 3am to execute
**What actually happened:**
1. Some event fired (cron or heartbeat) - exact configuration unknown
2. Event entered queue
3. Agent processed with available tools and instructions
4. Agent acquired Twilio phone number
5. Agent made the call
6. Owner didn't ask in the moment, but behavior was enabled in setup
**Key insight:** Nothing was thinking overnight. Nothing was deciding. Time produced event → Event kicked off agent → Agent followed instructions.
## The Complete Event Flow
**Event Sources:**
- Time creates events (heartbeats, crons)
- Humans create events (messages)
- External systems create events (webhooks)
- Internal state creates events (hooks)
- Agents create events for other agents
**Processing:**
All events → Enter queue → Queue processed → Agents execute → State persists → Loop continues
**Memory:**
- Stored in local markdown files
- Agent reads on wake-up
- Remembers previous conversations
- Not learning - just reading files you could open in text editor
## Security Concerns
### The Analysis
Cisco's security team analyzed OpenClaw ecosystem:
- 31,000 available skills examined
- 26% contain at least one vulnerability
- Called it "a security nightmare"
### Why It's Risky
OpenClaw has deep system access:
- Run shell commands
- Read and write files
- Execute scripts
- Control browser
### Specific Risks
1. **Prompt injection** through emails or documents
2. **Malicious skills** in marketplace
3. **Credential exposure**
4. **Command misinterpretation** that deletes unintended files
### OpenClaw's Own Warning
Documentation states: "There's no perfectly secure setup"
### Mitigation Strategies
- Run on secondary machine
- Use isolated accounts
- Limit enabled skills
- Monitor logs actively
- Use Railway's one-click deployment (runs in isolated container)
## Key Architectural Takeaways
### The Four Components
1. **Time** that produces events
2. **Events** that trigger agents
3. **State** that persists across interactions
4. **Loop** that keeps processing
### Building Your Own
You don't need OpenClaw specifically. You need:
- Event scheduling mechanism
- Queue system
- LLM for processing
- State persistence layer
### The Pattern
This architecture will appear everywhere. Every AI agent framework that "feels alive" uses some version of:
- Heartbeats
- Cron jobs
- Webhooks
- Event loops
- Persistent state
### Understanding vs Hype
Understanding this architecture means you can:
- Evaluate agent tools intelligently
- Build your own implementations
- Avoid getting caught up in viral hype
- Recognize the pattern in new frameworks
## The Bottom Line
OpenClaw isn't magic. It's not sentient. It doesn't think or reason.
**It's inputs, queues, and a loop.**
The "alive" feeling comes from well-designed event-driven architecture that makes a reactive system appear proactive. Time becomes an input. External systems become inputs. Internal state becomes inputs. All processed through the same queue with persistent memory.
Elegant engineering, not artificial consciousness.
## Further Resources
- OpenClaw documentation
- Clairvo's original thread (inspiration for this breakdown)
- Cisco security research on OpenClaw ecosystem

View File

@@ -0,0 +1,140 @@
# Aetheel vs Nanoclaw: Feature Comparison & OpenCode Assessment
Aetheel is a solid reimplementation of the core nanoclaw concept in Python, but there are meaningful gaps. Here's what maps, what's missing, and where the opencode integration could be improved.
---
## What Aetheel Has (Maps Well to Nanoclaw)
| Feature | Nanoclaw | Aetheel | Status |
|---|---|---|---|
| Multi-channel adapters | WhatsApp (baileys) | Slack + Telegram | ✅ Good — cleaner abstraction via `BaseAdapter` |
| Session isolation | Per-group sessions | Per-thread sessions via `SessionStore` | ✅ Good |
| Dual runtime support | Claude Code SDK only | OpenCode (CLI+SDK) + Claude Code CLI | ✅ Good — more flexible |
| Scheduled tasks | Cron + interval + once via MCP tool | Cron + one-shot via APScheduler | ✅ Good |
| Subagent spawning | SDK `Task`/`TeamCreate` tools | Background threads via `SubagentManager` | ✅ Basic |
| Memory system | CLAUDE.md files per group | SOUL.md + USER.md + MEMORY.md + hybrid search | ✅ Better — vector + BM25 search |
| Skills system | `.claude/skills/` with SKILL.md | `skills/<name>/SKILL.md` with trigger matching | ✅ Good |
| Action tags | MCP tools (send_message, schedule_task) | Regex-parsed `[ACTION:remind\|...]` tags | ✅ Different approach, works |
---
## What's Missing from Aetheel
### 1. Container Isolation
Nanoclaw's biggest architectural feature. Every agent runs in an isolated Apple Container (or Docker) with controlled volume mounts, secret injection via stdin, and per-group IPC namespaces. Aetheel runs everything in the same process. This means:
- No sandboxing of agent tool use (bash, file writes)
- No mount-based security boundaries between groups
- Secrets are in the process environment, not isolated
### 2. MCP Server Integration
Nanoclaw runs a custom MCP server (`ipc-mcp-stdio.ts`) inside the container that gives the agent tools like `send_message`, `schedule_task`, `register_group`. Aetheel uses regex-parsed action tags instead, which is fragile — the AI has to format tags perfectly, and there's no validation or structured tool calling.
### 3. Multi-Group Support
Nanoclaw has per-group folders, per-group memory (CLAUDE.md), per-group IPC, and a global memory layer. Aetheel has a single workspace with shared memory files. No group isolation.
### 4. Persistent Conversation Sessions on Disk
Nanoclaw stores sessions as JSONL files in `data/sessions/{group}/.claude/` and can resume at a specific assistant message UUID. Aetheel's `SessionStore` is in-memory only — sessions are lost on restart.
### 5. IPC Message Streaming
Nanoclaw's agent runner uses a `MessageStream` (AsyncIterable) to pipe follow-up messages into an active agent query. The host can send new messages to a running agent via IPC files. Aetheel's runtime is request-response only — one message in, one response out.
### 6. Transcript Archiving
Nanoclaw archives full conversation transcripts to markdown before context compaction via a `PreCompact` hook. Aetheel logs sessions to daily files but doesn't handle compaction.
### 7. Group Registration
Nanoclaw lets the main agent register new groups dynamically via an MCP tool. Aetheel has no equivalent.
### 8. Idle Timeout / Session Lifecycle
Nanoclaw has a 30-minute idle timeout that closes the container stdin, ending the session gracefully. Aetheel has session TTL cleanup but no active lifecycle management.
---
## OpenCode Integration Assessment
The opencode runtime implementation in `agent/opencode_runtime.py` is well-structured. Here's what's correct and what needs attention.
### What's Done Well
- Dual mode (CLI + SDK) with graceful fallback from SDK to CLI
- Binary auto-discovery across common install paths
- JSONL event parsing for `opencode run --format json` output
- Session ID extraction from event stream
- System prompt injection via XML tags (correct workaround since `opencode run` doesn't have `--system-prompt`)
- Config from environment variables
### Issues / Improvements Needed
#### 1. SDK Client API Mismatch
The code calls `self._sdk_client.session.chat(session_id, **chat_kwargs)` but the opencode Python SDK uses `client.session.prompt()` not `.chat()`. The correct call is:
```python
response = self._sdk_client.session.prompt(
path={"id": session_id},
body={"parts": parts, "model": model_config}
)
```
#### 2. SDK Client Initialization
The code uses `from opencode_ai import Opencode` but the actual SDK package is `@opencode-ai/sdk` (JS/TS) or `opencode-sdk-python` (Python). The Python SDK uses `createOpencodeClient` pattern. Verify the actual Python SDK import path — it may be `from opencode import Client` or similar depending on the package version.
#### 3. No `--continue` Flag Validation
The CLI mode passes `--continue` and `--session` for session continuity, but `opencode run` may not support `--continue` the same way as the TUI. The `opencode run` command is designed for single-shot execution. For session continuity in CLI mode, you'd need to use the SDK mode with `opencode serve`.
#### 4. Missing `--system` Flag
The code injects system prompts as XML in the message body. This works but is a workaround. The SDK mode's `client.session.prompt()` supports a `system` parameter in the body, which would be cleaner.
#### 5. No Structured Output Support
Opencode's SDK supports `format: { type: "json_schema", schema: {...} }` for structured responses. This could replace the fragile `[ACTION:...]` regex parsing with proper tool calls.
#### 6. No Plugin/Hook Integration
Opencode has a plugin system (`tool.execute.before`, `tool.execute.after`, `experimental.session.compacting`) that could replace the action tag parsing. You could create an opencode plugin that exposes `send_message` and `schedule_task` as custom tools, similar to nanoclaw's MCP approach.
#### 7. Session Persistence
`SessionStore` is in-memory. Opencode's server persists sessions natively, so in SDK mode you could rely on the server's session storage and just map `conversation_id → opencode_session_id` in a SQLite table.
---
## Architectural Gap Summary
The biggest architectural gap isn't about opencode specifically — it's that Aetheel runs the agent in-process without isolation, while nanoclaw's container model is what makes it safe to give the agent bash access and file write tools.
To close that gap, options include:
- **Containerize the opencode runtime** — run `opencode serve` inside a Docker container with controlled mounts
- **Use opencode's permission system** — configure all dangerous tools to `"ask"` or `"deny"` per agent
- **Add an MCP server** — replace action tag regex parsing with proper MCP tools for `send_message`, `schedule_task`, etc.
- **Persist sessions to SQLite** — survive restarts and enable resume-at-message functionality
---
## Nanoclaw Features → Opencode Equivalents
| Nanoclaw (Claude Code SDK) | Opencode Equivalent | Gap Level |
|---|---|---|
| `query()` async iterable | HTTP server + SDK `client.session.prompt()` | 🔴 Architecture change needed |
| `resume` + `resumeSessionAt` | `POST /session/:id/message` | 🟡 No resume-at-UUID equivalent |
| Streaming message types (system/init, assistant, result) | SSE events via `GET /event` | 🟡 Different event schema |
| `PreCompact` hook | `experimental.session.compacting` plugin | 🟢 Similar concept, different API |
| `PreToolUse` hook (bash sanitization) | `tool.execute.before` plugin | 🟢 Similar concept, different API |
| `bypassPermissions` | Per-tool permission config set to `"allow"` | 🟢 Direct mapping |
| `isSingleUserTurn: false` via AsyncIterable | `prompt_async` endpoint | 🟡 Needs verification |
| CLAUDE.md auto-loading via `settingSources` | AGENTS.md convention | 🟢 Rename files |
| Secrets via `env` param on `query()` | `shell.env` plugin hook | 🟡 Different isolation model |
| MCP servers in `query()` config | `opencode.json` mcp config or `POST /mcp` | 🟢 Direct mapping |

243
docs/research/comparison.md Normal file
View File

@@ -0,0 +1,243 @@
# ⚔️ Aetheel vs. Inspiration Repos — Comparison & Missing Features
> A detailed comparison of Aetheel with Nanobot, NanoClaw, OpenClaw, and PicoClaw — highlighting what's different, what's missing, and what can be added.
---
## Feature Comparison Matrix
| Feature | Aetheel | Nanobot | NanoClaw | OpenClaw | PicoClaw |
|---------|---------|---------|----------|----------|----------|
| **Language** | Python | Python | TypeScript | TypeScript | Go |
| **Channels** | Slack only | 9 channels | WhatsApp only | 15+ channels | 5 channels |
| **LLM Runtime** | OpenCode / Claude Code (subprocess) | LiteLLM (multi-provider) | Claude Agent SDK | Pi Agent (custom RPC) | Go-native agent |
| **Memory** | Hybrid (vector + BM25) | Simple file-based | Per-group CLAUDE.md | Workspace files | MEMORY.md + sessions |
| **Config** | `config.json` with `env.vars` + `${VAR}` | `config.json` | Code changes (no config) | JSON5 config | `config.json` |
| **Skills** | ❌ None | ✅ Bundled + custom | ✅ Code skills (transform) | ✅ Bundled + managed + workspace | ✅ Custom skills |
| **Scheduled Tasks** | ⚠️ Action tags (remind only) | ✅ Full cron system | ✅ Task scheduler | ✅ Cron + webhooks + Gmail | ✅ Cron + heartbeat |
| **Security** | ❌ No sandbox | ⚠️ Workspace restriction | ✅ Container isolation | ✅ Docker sandbox + pairing | ✅ Workspace sandbox |
| **MCP Support** | ❌ No | ✅ Yes | ❌ No | ❌ No | ❌ No |
| **Web Search** | ❌ No | ✅ Brave Search | ✅ Via Claude tools | ✅ Browser control | ✅ Brave + DuckDuckGo |
| **Voice** | ❌ No | ✅ Via Groq Whisper | ❌ No | ✅ Voice Wake + Talk Mode | ✅ Via Groq Whisper |
| **Browser Control** | ❌ No | ❌ No | ❌ No | ✅ Full CDP control | ❌ No |
| **Companion Apps** | ❌ No | ❌ No | ❌ No | ✅ macOS + iOS + Android | ❌ No |
| **Session Management** | ✅ Thread-based (Slack) | ✅ Session-based | ✅ Per-group isolated | ✅ Full sessions + agent-to-agent | ✅ Session-based |
| **Docker Support** | ❌ No | ✅ Yes | ❌ (uses Apple Container) | ✅ Full compose setup | ✅ Yes |
| **Install Script** | ✅ Yes | ✅ pip/uv install | ✅ Claude guides setup | ✅ npm + wizard | ✅ Binary / make |
| **Identity Files** | ✅ SOUL.md, USER.md, MEMORY.md | ✅ AGENTS.md, SOUL.md, USER.md, etc. | ✅ CLAUDE.md per group | ✅ AGENTS.md, SOUL.md, USER.md, TOOLS.md | ✅ Full set (AGENTS, SOUL, IDENTITY, USER, TOOLS) |
| **Subagents** | ❌ No | ✅ Spawn subagent | ✅ Agent Swarms | ✅ sessions_send / sessions_spawn | ✅ Spawn subagent |
| **Heartbeat/Proactive** | ❌ No | ✅ Heartbeat | ❌ No | ✅ Cron + wakeups | ✅ HEARTBEAT.md |
| **Multi-provider** | ⚠️ Via OpenCode/Claude | ✅ 12+ providers | ❌ Claude only | ✅ Multi-model + failover | ✅ 7+ providers |
| **WebChat** | ❌ No | ❌ No | ❌ No | ✅ Built-in WebChat | ❌ No |
---
## What Aetheel Does Well
### ✅ Strengths
1. **Advanced Memory System** — Aetheel has the most sophisticated memory system with **hybrid search (0.7 vector + 0.3 BM25)**, local embeddings via `fastembed`, and SQLite FTS5. None of the other repos have this level of memory sophistication.
2. **Local-First Embeddings** — Zero API calls for memory search. Uses ONNX-based local model (BAAI/bge-small-en-v1.5).
3. **Dual Runtime Support** — Clean abstraction allowing switching between OpenCode and Claude Code with the same `AgentResponse` interface.
4. **Thread Isolation in Slack** — Each Slack thread gets its own AI session, providing natural conversation isolation.
5. **Action Tags** — Inline `[ACTION:remind|minutes|message]` tags are elegant for in-response scheduling.
6. **File Watching** — Memory auto-reindexes when `.md` files are edited.
---
## What Aetheel Is Missing
### 🔴 Critical Gaps (High Priority)
#### 1. Multi-Channel Support
**Current:** Slack only
**All others:** Multiple channels (3-15+)
Aetheel is locked to Slack. Adding at least **Telegram** and **Discord** would significantly increase usability. All four inspiration repos treat multi-channel as essential.
> **Recommendation:** Follow Nanobot's pattern — each channel is a module in `channels/` with a common interface. Start with Telegram (easiest — just a token).
#### 2. Skills System
**Current:** None
**Others:** All have skills/plugins
Aetheel has no way to extend agent capabilities beyond its hardcoded memory and runtime setup. A skills system would allow:
- Bundled skills (GitHub, weather, web search)
- User-created skills in workspace
- Community-contributed skills
> **Recommendation:** Create a `skills/` directory in the workspace. Skills are markdown files (`SKILL.md`) that get injected into the agent's context.
#### 3. Scheduled Tasks (Cron)
**Current:** Only `[ACTION:remind]` (one-time, simple)
**Others:** Full cron systems with persistent storage
The action tag system is clever but limited. A proper cron system would support:
- Recurring cron expressions (`0 9 * * *`)
- Interval-based scheduling
- Persistent job storage
- CLI management
> **Recommendation:** Add a `cron/` module with SQLite-backed job storage and an APScheduler-based execution engine.
#### 4. Security Sandbox
**Current:** No sandboxing
**Others:** Container isolation (NanoClaw), workspace restriction (PicoClaw), Docker sandbox (OpenClaw)
The AI runtime has unrestricted system access. At minimum, workspace-level restrictions should be added.
> **Recommendation:** Follow PicoClaw's approach — restrict tool access to workspace directory by default. Block dangerous shell commands.
---
### 🟡 Important Gaps (Medium Priority)
#### 5. Config File System (JSON with env.vars — ✅ Done)
**Current:** `config.json` with `env.vars` block and `${VAR}` substitution for secrets
**Others:** JSON/JSON5 config files
Aetheel now uses a single config.json with an `env.vars` block for secrets and `${VAR}` references, matching openclaw's approach.
> **Status:** ✅ Implemented — no separate `.env` file needed.
#### 6. Web Search Tool
**Current:** No web search
**Others:** Brave Search, DuckDuckGo, or full browser control
The agent can't search the web. This is a significant limitation for a personal assistant.
> **Recommendation:** Add Brave Search API integration (free tier: 2000 queries/month) with DuckDuckGo as fallback.
#### 7. Subagent / Spawn Capability
**Current:** No subagents
**Others:** All have spawn/subagent systems
For long-running tasks, the main agent should be able to spawn background sub-tasks that work independently and report back.
> **Recommendation:** Add a `spawn` tool that creates a background thread/process running a separate agent session.
#### 8. Heartbeat / Proactive System
**Current:** No proactive capabilities
**Others:** Nanobot and PicoClaw have heartbeat systems
The agent only responds to messages. A heartbeat system would allow periodic check-ins, proactive notifications, and scheduled intelligence.
> **Recommendation:** Add `HEARTBEAT.md` file + periodic timer that triggers agent with heartbeat tasks.
#### 9. CLI Interface
**Current:** Only `python main.py` with flags
**Others:** Full CLI with subcommands (`nanobot agent`, `picoclaw cron`, etc.)
> **Recommendation:** Add a CLI using `click` or `argparse` with subcommands: `aetheel chat`, `aetheel status`, `aetheel cron`, etc.
#### 10. Tool System
**Current:** No explicit tool system (AI handles everything via runtime)
**Others:** Shell exec, file R/W, web search, spawn, message, etc.
Aetheel delegates all tool use to the AI runtime (OpenCode/Claude Code). While this works, having explicit tools gives more control and allows sandboxing.
> **Recommendation:** Define a tool interface and implement core tools (file ops, shell, web search) that run through the aetheel process with sandboxing.
---
### 🟢 Nice-to-Have (Lower Priority)
#### 11. MCP Server Support
Only Nanobot supports MCP. Would allow connecting external tool servers.
#### 12. Multi-Provider Support
Currently relies on OpenCode/Claude Code for provider handling. Direct multi-provider support (like Nanobot's 12+ providers via LiteLLM) would add flexibility.
#### 13. Docker / Container Support
No Docker compose or containerized deployment option.
#### 14. Agent-to-Agent Communication
OpenClaw's `sessions_send` allows agents to message each other. Useful for multi-agent workflows.
#### 15. Gateway Architecture
Moving from a direct Slack adapter to a gateway pattern would make adding channels much easier.
#### 16. Onboarding Wizard
OpenClaw's `onboard --install-daemon` provides a guided setup. Aetheel's install script is good but could be more interactive.
#### 17. Voice Support
Voice Wake / Talk Mode (OpenClaw) or Whisper transcription (Nanobot, PicoClaw).
#### 18. WebChat Interface
A browser-based chat UI connected to the gateway.
#### 19. TOOLS.md File
A `TOOLS.md` file describing available tools to the agent, used by PicoClaw and OpenClaw.
#### 20. Self-Modification
From `additions.txt`: "edit its own files and config as well as add skills" — the agent should be able to modify its own configuration and add new skills.
---
## Architecture Comparison
```mermaid
graph LR
subgraph Aetheel["⚔️ Aetheel (Current)"]
A_SLACK["Slack\n(only channel)"]
A_MAIN["main.py"]
A_MEM["Memory\n(hybrid search)"]
A_RT["OpenCode / Claude\n(subprocess)"]
end
subgraph Target["🎯 Target Architecture"]
T_CHAN["Multi-Channel\nGateway"]
T_CORE["Core Agent\n+ Tool System"]
T_MEM["Memory\n(hybrid search)"]
T_SK["Skills"]
T_CRON["Cron"]
T_PROV["Multi-Provider"]
T_SEC["Security\nSandbox"]
end
A_SLACK --> A_MAIN
A_MAIN --> A_MEM
A_MAIN --> A_RT
T_CHAN --> T_CORE
T_CORE --> T_MEM
T_CORE --> T_SK
T_CORE --> T_CRON
T_CORE --> T_PROV
T_CORE --> T_SEC
```
---
## Prioritized Roadmap Suggestion
Based on the analysis, here's a suggested implementation order:
### Phase 1: Foundation (Essentials)
1. **Config system** — ✅ Done: `config.json` with `env.vars` + `${VAR}` substitution
2. **Skills system**`skills/` directory with `SKILL.md` loading
3. **Tool system** — Core tools (shell, file, web search) with sandbox
4. **Security sandbox** — Workspace-restricted tool execution
### Phase 2: Channels & Scheduling
5. **Channel abstraction** — Extract adapter interface from Slack adapter
6. **Telegram channel** — First new channel
7. **Cron system** — Full scheduled task management
8. **CLI** — Proper CLI with subcommands
### Phase 3: Advanced Features
9. **Heartbeat** — Proactive agent capabilities
10. **Subagents** — Spawn background tasks
11. **Discord channel** — Second new channel
12. **Web search** — Brave Search + DuckDuckGo
### Phase 4: Polish
13. **Self-modification** — Agent can edit config and add skills
14. **Docker support** — Dockerfile + compose
15. **MCP support** — External tool servers
16. **WebChat** — Browser-based chat UI

View File

@@ -0,0 +1,59 @@
OpenClaw's Discord integration is a massive, enterprise-grade system compared to Aetheel's. Here are the key differences:
What Aetheel has and does fine:
Basic gateway connection via discord.py
DM + @mention handling
Message chunking (2000 char limit)
Listen channels (respond without @mention in specific channels)
Background thread support
Token from env var
What OpenClaw has that Aetheel is missing:
Multi-account support — OpenClaw can run multiple Discord bot accounts simultaneously, each with its own token, config, and identity. Aetheel supports exactly one bot token.
DM access policies — OpenClaw has pairing, allowlist, open, and disabled DM policies. Pairing mode requires users to get a code approved before they can DM the bot. Aetheel lets anyone DM the bot with zero access control.
Guild access policies — OpenClaw has open, allowlist, and disabled guild policies with per-guild and per-channel allowlists. You can restrict which servers, which channels within a server, and which users/roles can trigger the bot. Aetheel has no guild-level access control at all.
Role-based routing — OpenClaw can route Discord users to different AI agents based on their Discord roles. Aetheel has no concept of this.
[-] Interactive components (v2) — OpenClaw supports Discord buttons, select menus, modal forms, and media galleries. The AI can send rich interactive messages. Aetheel sends plain text only.
[-] Native slash commands — OpenClaw registers and handles Discord slash commands natively. Aetheel has no slash command support.
[-] Reply threading — OpenClaw supports replyToMode (off, first, all) and explicit [[reply_to:<id>]] tags so the bot can reply to specific messages. Aetheel doesn't use Discord's reply feature at all.
[-] History context — OpenClaw injects configurable message history (historyLimit, default 20) from the Discord channel into the AI context. Aetheel doesn't read channel history.
[-] Reaction handling — OpenClaw can receive and send reactions, with configurable notification modes (off, own, all, allowlist). Aetheel ignores reactions entirely.
[-] Ack reactions — OpenClaw sends an acknowledgement emoji (e.g. 👀) while processing a message, so users know the bot is working. Aetheel gives no processing feedback.
[-] Typing indicators — OpenClaw shows typing indicators while the agent processes. Aetheel doesn't.
Media/file handling — OpenClaw can send and receive files, images, and voice messages (with ffmpeg conversion). Aetheel ignores attachments.
Voice messages — OpenClaw can send voice messages with auto-generated waveforms. Aetheel has no voice support.
[-] Exec approvals — OpenClaw can post button-based approval prompts in Discord for dangerous operations (like shell commands). Aetheel has no human-in-the-loop approval flow.
Polls — OpenClaw can create Discord polls. Aetheel can't.
Moderation tools — OpenClaw exposes timeout, kick, ban, role management as AI-accessible actions with configurable gates. Aetheel has none.
Channel management — OpenClaw can create, edit, delete, and move channels. Aetheel can't.
PluralKit support — OpenClaw resolves proxied messages from PluralKit systems. Niche but shows the depth.
Presence/status — OpenClaw can set the bot's online status, activity, and streaming status. Aetheel's bot just shows as "online" with no custom status.
Gateway proxy — OpenClaw supports routing Discord traffic through an HTTP proxy. Aetheel doesn't.
Retry/resilience — OpenClaw has configurable retry policies for Discord API calls. Aetheel has no retry logic.
Config writes from chat — OpenClaw lets users modify bot config via Discord commands. Aetheel's /config set works but isn't Discord-specific.
Session isolation model — OpenClaw has sophisticated session keys: DMs share a main session by default, guild channels get isolated sessions (agent:<agentId>:discord:channel:<channelId>), slash commands get their own sessions. Aetheel uses channel_id as the conversation ID for everything, which is simpler but less flexible.
Bottom line: Aetheel's Discord adapter is a functional but minimal "receive messages, send text back" integration. OpenClaw's is a full Discord platform with interactive UI, access control, moderation, media, threading, multi-account, and agent routing. The biggest practical gaps for Aetheel are probably: access control (DM/guild policies), typing/ack indicators, reply threading, history context injection, and interactive components.

207
docs/research/nanobot.md Normal file
View File

@@ -0,0 +1,207 @@
# 🐈 Nanobot — Architecture & How It Works
> **Ultra-Lightweight Personal AI Assistant** — ~4,000 lines of Python, 99% smaller than OpenClaw.
## Overview
Nanobot is a minimalist personal AI assistant written in Python that focuses on delivering core agent functionality with the smallest possible codebase. It uses LiteLLM for multi-provider LLM routing, supports 9+ chat channels, and includes memory, skills, scheduled tasks, and MCP tool integration.
| Attribute | Value |
|-----------|-------|
| **Language** | Python 3.11+ |
| **Lines of Code** | ~4,000 (core agent) |
| **Config** | `~/.nanobot/config.json` |
| **Package** | `pip install nanobot-ai` |
| **LLM Routing** | LiteLLM (multi-provider) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Chat Channels"]
TG["Telegram"]
DC["Discord"]
WA["WhatsApp"]
FS["Feishu"]
MC["Mochat"]
DT["DingTalk"]
SL["Slack"]
EM["Email"]
QQ["QQ"]
end
subgraph Gateway["🌐 Gateway (nanobot gateway)"]
CH["Channel Manager"]
MQ["Message Queue"]
end
subgraph Agent["🧠 Core Agent"]
LOOP["Agent Loop\n(loop.py)"]
CTX["Context Builder\n(context.py)"]
MEM["Memory System\n(memory.py)"]
SK["Skills Loader\n(skills.py)"]
SA["Subagent\n(subagent.py)"]
end
subgraph Tools["🔧 Built-in Tools"]
SHELL["Shell Exec"]
FILE["File R/W/Edit"]
WEB["Web Search"]
SPAWN["Spawn Subagent"]
MCP["MCP Servers"]
end
subgraph Providers["☁️ LLM Providers (LiteLLM)"]
OR["OpenRouter"]
AN["Anthropic"]
OA["OpenAI"]
DS["DeepSeek"]
GR["Groq"]
GE["Gemini"]
VL["vLLM (local)"]
end
Channels --> Gateway
Gateway --> Agent
CTX --> LOOP
MEM --> CTX
SK --> CTX
LOOP --> Tools
LOOP --> Providers
SA --> LOOP
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Chat Channel
participant GW as Gateway
participant Agent as Agent Loop
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward message
GW->>Agent: Route to agent
Agent->>Agent: Build context (memory, skills, identity)
Agent->>LLM: Send prompt + tools
LLM-->>Agent: Response (text or tool call)
alt Tool Call
Agent->>Tools: Execute tool
Tools-->>Agent: Tool result
Agent->>LLM: Send tool result
LLM-->>Agent: Final response
end
Agent->>Agent: Update memory
Agent-->>GW: Return response
GW-->>Channel: Send reply
Channel-->>User: Display response
```
---
## Key Components
### 1. Agent Loop (`agent/loop.py`)
The core loop that manages the LLM ↔ tool execution cycle:
- Builds a prompt using context (memory, skills, identity files)
- Sends to LLM via LiteLLM
- If LLM returns a tool call → executes it → sends result back
- Continues until LLM returns a text response (no more tool calls)
### 2. Context Builder (`agent/context.py`)
Assembles the system prompt from:
- **Identity files**: `AGENTS.md`, `SOUL.md`, `USER.md`, `TOOLS.md`, `IDENTITY.md`
- **Memory**: Persistent `MEMORY.md` with recall
- **Skills**: Loaded from `~/.nanobot/workspace/skills/`
- **Conversation history**: Session-based context
### 3. Memory System (`agent/memory.py`)
- Persistent memory stored in `MEMORY.md` in the workspace
- Agent can read and write memories
- Survives across sessions
### 4. Provider Registry (`providers/registry.py`)
- Single-source-of-truth for all LLM providers
- Adding a new provider = 2 steps (add `ProviderSpec` + config field)
- Auto-prefixes model names for LiteLLM routing
- Supports 12+ providers including local vLLM
### 5. Channel System (`channels/`)
- 9 chat platforms supported (Telegram, Discord, WhatsApp, Feishu, Mochat, DingTalk, Slack, Email, QQ)
- Each channel handles auth, message parsing, and response delivery
- Allowlist-based security (`allowFrom`)
- Started via `nanobot gateway`
### 6. Skills (`skills/`)
- Bundled skills: GitHub, weather, tmux, etc.
- Custom skills loaded from workspace
- Skills are injected into the agent's context
### 7. Scheduled Tasks (Cron)
- Add jobs via `nanobot cron add`
- Supports cron expressions and interval-based scheduling
- Jobs stored persistently
### 8. MCP Integration
- Supports Model Context Protocol servers
- Stdio and HTTP transport modes
- Compatible with Claude Desktop / Cursor MCP configs
- Tools auto-discovered and registered at startup
---
## Project Structure
```
nanobot/
├── agent/ # 🧠 Core agent logic
│ ├── loop.py # Agent loop (LLM ↔ tool execution)
│ ├── context.py # Prompt builder
│ ├── memory.py # Persistent memory
│ ├── skills.py # Skills loader
│ ├── subagent.py # Background task execution
│ └── tools/ # Built-in tools (incl. spawn)
├── skills/ # 🎯 Bundled skills (github, weather, tmux...)
├── channels/ # 📱 Chat channel integrations
├── providers/ # ☁️ LLM provider registry
├── config/ # ⚙️ Configuration schema
├── cron/ # ⏰ Scheduled tasks
├── heartbeat/ # 💓 Heartbeat system
├── session/ # 📝 Session management
├── bus/ # 📨 Internal event bus
├── cli/ # 🖥️ CLI commands
└── utils/ # 🔧 Utilities
```
---
## CLI Commands
| Command | Description |
|---------|-------------|
| `nanobot onboard` | Initialize config & workspace |
| `nanobot agent -m "..."` | Chat with the agent |
| `nanobot agent` | Interactive chat mode |
| `nanobot gateway` | Start all channels |
| `nanobot status` | Show status |
| `nanobot cron add/list/remove` | Manage scheduled tasks |
| `nanobot channels login` | Link WhatsApp device |
---
## Key Design Decisions
1. **LiteLLM for provider abstraction** — One interface for all LLM providers
2. **JSON config over env vars** — Single `config.json` file for all settings
3. **Skills-based extensibility** — Modular skill system for adding capabilities
4. **Provider Registry pattern** — Adding providers is 2-step, zero if-elif chains
5. **Agent social network** — Can join MoltBook, ClawdChat communities

View File

@@ -0,0 +1,315 @@
# Aetheel vs NanoClaw — Feature Gap Analysis
Deep comparison of Aetheel (Python, multi-channel AI assistant) and NanoClaw (TypeScript, container-isolated personal AI assistant). Focus: what NanoClaw has that Aetheel is missing.
---
## Architecture Differences
| Aspect | Aetheel | NanoClaw |
|--------|---------|----------|
| Language | Python | TypeScript |
| Agent execution | In-process (shared memory) | Container-isolated (Apple Container / Docker) |
| Identity model | Shared across all channels (SOUL.md, USER.md, MEMORY.md) | Per-group (each group has its own CLAUDE.md) |
| Security model | Application-level checks | OS-level container isolation |
| Config approach | Config-driven (`config.json` with `env.vars` + `${VAR}`) | Code-first (Claude modifies your fork) |
| Philosophy | Feature-rich framework | Minimal, understandable in 8 minutes |
---
## Features Aetheel Is Missing
### 1. Container Isolation (Critical)
NanoClaw runs every agent invocation inside a Linux container (Apple Container on macOS, Docker on Linux). Each container:
- Gets only explicitly mounted directories
- Runs as non-root (uid 1000)
- Is ephemeral (`--rm` flag, fresh per invocation)
- Cannot access other groups' files or sessions
- Cannot access host filesystem beyond mounts
Aetheel runs everything in-process with no sandboxing. The security audit already flagged path traversal, arbitrary code execution via hooks, and unvalidated action tags as critical issues.
**What to build:**
- Docker-based agent execution (spawn a container per AI request)
- Mount only the relevant group's workspace directory
- Pass secrets via stdin, not mounted files
- Add a `/convert-to-docker` skill or built-in Docker mode
---
### 2. Per-Group Isolation
NanoClaw gives each chat group its own:
- Filesystem folder (`groups/{name}/`)
- Memory file (`CLAUDE.md` per group)
- Session history (isolated `.claude/` directory)
- IPC namespace (prevents cross-group privilege escalation)
- Container mounts (only own folder + read-only global)
Aetheel shares SOUL.md, USER.md, and MEMORY.md across all channels and conversations. A Slack channel, Discord server, and Telegram group all see the same memory and identity.
**What to build:**
- Per-channel or per-group workspace directories
- Isolated session storage per group
- A `global/` shared memory that all groups can read but only the main channel can write
- Group registration system (like NanoClaw's `registerGroup()`)
---
### 3. Working Agent Teams / Swarms
NanoClaw has working agent teams today via Claude Code's experimental `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`:
- Lead agent creates teammates using Claude's native `TeamCreate` / `SendMessage` tools
- Each teammate runs in its own container
- On Telegram, each agent gets a dedicated bot identity (pool of pre-created bots renamed dynamically via `setMyName`)
- The lead agent coordinates but doesn't relay every message — users see teammate messages directly
- `<internal>` tags let agents communicate without spamming the user
Aetheel has the tools in the allowed list (`TeamCreate`, `TeamDelete`, `SendMessage`) but no actual orchestration, no per-agent identity, and no way for teammates to appear as separate entities in chat.
**What to build:**
- Enable `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` when using Claude runtime
- Bot pool for Telegram/Discord (multiple bot tokens, one per agent role)
- IPC routing that respects `sender` field to route messages through the right bot
- Per-agent CLAUDE.md / SOUL.md files
- `<internal>` tag stripping in outbound messages
---
### 4. Mount Security / Allowlist
NanoClaw has a tamper-proof mount allowlist at `~/.config/nanoclaw/mount-allowlist.json` (outside the project root, never mounted into containers):
- Defines which host directories can be mounted
- Default blocked patterns: `.ssh`, `.gnupg`, `.aws`, `.env`, `private_key`, etc.
- Symlink resolution before validation (prevents traversal)
- `nonMainReadOnly` forces read-only for non-main groups
- Per-root `allowReadWrite` control
Aetheel has no filesystem access control. The AI can read/write anywhere the process has permissions.
**What to build:**
- External allowlist config (outside workspace, not modifiable by the AI)
- Blocked path patterns for sensitive directories
- Symlink resolution and path validation
- Read-only enforcement for non-primary channels
---
### 5. IPC-Based Communication
NanoClaw uses file-based IPC for all agent-to-host communication:
- Agents write JSON files to `data/ipc/{group}/messages/` and `data/ipc/{group}/tasks/`
- Host polls IPC directories and processes files
- Per-group IPC namespaces prevent cross-group message injection
- Authorization checks: non-main groups can only send to their own chat, schedule tasks for themselves
- Error files moved to `data/ipc/errors/` for debugging
Aetheel uses in-memory action tags parsed from AI response text (`[ACTION:remind|...]`, `[ACTION:cron|...]`). No authorization, no isolation, no audit trail.
**What to build:**
- File-based or queue-based IPC for agent communication
- Per-group namespaces with authorization
- Audit trail for all IPC operations
- Error handling with failed message preservation
---
### 6. Group Queue with Concurrency Control
NanoClaw has a `GroupQueue` class that manages container execution:
- Max concurrent containers limit (`MAX_CONCURRENT_CONTAINERS`, default 5)
- Per-group queuing (messages and tasks queue while container is active)
- Follow-up messages sent to active containers via IPC input files
- Idle timeout with `_close` sentinel to wind down containers
- Exponential backoff retry (5s base, max 5 retries)
- Graceful shutdown (detaches containers, doesn't kill them)
- Task priority over messages in drain order
Aetheel has a simple concurrent limit of 3 subagents but no queuing, no retry logic, no follow-up message support, and no graceful shutdown.
**What to build:**
- Proper execution queue with configurable concurrency
- Per-channel message queuing when agent is busy
- Follow-up message injection into active sessions
- Exponential backoff retry on failures
- Graceful shutdown that lets active agents finish
---
### 7. Task Context Modes
NanoClaw scheduled tasks support two context modes:
- `group` — uses the group's existing session (shared conversation history)
- `isolated` — fresh session per task run (no prior context)
Aetheel scheduled tasks always run in a fresh context with no option to share the group's conversation history.
**What to build:**
- `context_mode` field on scheduled jobs (`group` vs `isolated`)
- Session ID passthrough for `group` mode tasks
---
### 8. Task Run Logging
NanoClaw logs every task execution:
- `task_run_logs` table with: task_id, run_at, duration_ms, status, result, error
- `last_result` summary stored on the task itself
- Tasks auto-complete after `once` schedule runs
Aetheel's scheduler persists jobs but doesn't log execution history or results.
**What to build:**
- Task run log table (when it ran, how long, success/error, result summary)
- Queryable task history (`task history <id>`)
---
### 9. Streaming Output with Idle Timeout
NanoClaw streams agent output in real-time:
- Container output is parsed as it arrives (sentinel markers for robust parsing)
- Results are forwarded to the user immediately via `sendMessage`
- Idle timeout (default 30 min) closes the container if no output for too long
- Prevents hanging containers from blocking the queue
Aetheel waits for the full AI response before sending anything back.
**What to build:**
- Streaming response support (send partial results as they arrive)
- Idle timeout for long-running agent sessions
- Typing indicators while agent is processing
---
### 10. Skills as Code Transformations
NanoClaw's skills are fundamentally different from Aetheel's:
- Skills are SKILL.md files that teach Claude Code how to modify the codebase
- A deterministic skills engine applies code changes (three-way merge, file additions)
- Skills have state tracking (`.nanoclaw/state.yaml`), backups, and rollback
- Examples: `/add-telegram`, `/add-discord`, `/add-gmail`, `/add-voice-transcription`, `/convert-to-docker`, `/add-parallel`
- Each skill is a complete guide: pre-flight checks, code changes, setup, verification, troubleshooting
Aetheel's skills are runtime context injections (markdown instructions added to the system prompt when trigger words match). They don't modify code.
**What to build:**
- Skills engine that can apply code transformations
- State tracking for applied skills
- Rollback support
- Template skills for common integrations
---
### 11. Voice Message Transcription
NanoClaw has a skill (`/add-voice-transcription`) that:
- Detects WhatsApp voice notes (`audioMessage.ptt === true`)
- Downloads audio via Baileys
- Transcribes using OpenAI Whisper API
- Stores transcribed content as `[Voice: <text>]` in the database
- Configurable provider, fallback message, enable/disable
Aetheel has no voice message handling.
**What to build:**
- Voice message detection per adapter (Telegram, Discord, Slack all support voice)
- Whisper API integration for transcription
- Transcribed content injection into the conversation
---
### 12. Gmail / Email Integration
NanoClaw has a skill (`/add-gmail`) with two modes:
- Tool mode: agent can read/send emails when triggered from chat
- Channel mode: emails trigger the agent, agent replies via email
- GCP OAuth setup guide
- Email polling with deduplication
- Per-thread or per-sender context isolation
Aetheel has no email integration.
**What to build:**
- Gmail MCP integration (or direct API)
- Email as a channel adapter
- OAuth credential management
---
### 13. WhatsApp Support
NanoClaw's primary channel is WhatsApp via the Baileys library:
- QR code and pairing code authentication
- Group metadata sync
- Message history storage per registered group
- Bot message filtering (prevents echo loops)
Aetheel supports Slack, Discord, Telegram, and WebChat but not WhatsApp.
**What to build:**
- WhatsApp adapter using a library like Baileys or the WhatsApp Business API
- QR code authentication flow
- Group registration and metadata sync
---
### 14. Structured Message Routing
NanoClaw has a clean channel abstraction:
- `Channel` interface: `connect()`, `sendMessage()`, `isConnected()`, `ownsJid()`, `disconnect()`, `setTyping?()`
- `findChannel()` routes outbound messages to the right channel by JID prefix (`tg:`, `dc:`, WhatsApp JIDs)
- `formatOutbound()` strips `<internal>` tags before sending
- XML-escaped message formatting for agent input
Aetheel's adapters work but lack JID-based routing, `<internal>` tag support, and typing indicators across all adapters.
**What to build:**
- JID-based message routing (prefix per channel)
- `<internal>` tag stripping for agent-to-agent communication
- Typing indicators for all adapters
- Unified channel interface with `ownsJid()` pattern
---
## Priority Recommendations
### High Priority (Security + Core Gaps)
1. Container isolation for agent execution
2. Fix the 10 critical/high security issues from the security audit
3. Per-group isolation (memory, sessions, filesystem)
4. Mount security allowlist
### Medium Priority (Feature Parity)
5. Working agent teams with per-agent identity
6. Group queue with concurrency control and retry
7. Task context modes and run logging
8. Streaming output with idle timeout
9. IPC-based communication with authorization
### Lower Priority (Nice to Have)
10. Voice message transcription
11. WhatsApp adapter
12. Gmail/email integration
13. Skills as code transformations
14. Structured message routing with JID prefixes
---
## What Aetheel Has That NanoClaw Doesn't
For reference, these are Aetheel strengths to preserve:
- Dual runtime support (OpenCode + Claude Code) with live switching
- Auto-failover on rate limits
- Per-request cost tracking and usage stats
- Local vector search (hybrid: 0.7 vector + 0.3 BM25) with fastembed
- Built-in multi-channel (Slack, Discord, Telegram, WebChat, Webhooks)
- WebChat browser UI
- Heartbeat / proactive task system
- Lifecycle hooks (gateway:startup, command:reload, agent:response, etc.)
- Comprehensive CLI (`aetheel start/stop/restart/logs/doctor/config/cron/memory`)
- Config-driven setup (no code changes needed for basic customization)
- Self-modification (AI can edit its own config, skills, identity files)
- Hot reload (`/reload` command)

214
docs/research/nanoclaw.md Normal file
View File

@@ -0,0 +1,214 @@
# 🦀 NanoClaw — Architecture & How It Works
> **Minimal, Security-First Personal AI Assistant** — built on Claude Agent SDK with container isolation.
## Overview
NanoClaw is a minimalist personal AI assistant that prioritizes **security through container isolation** and **understandability through small codebase size**. It runs on Claude Agent SDK (Claude Code) and uses WhatsApp as its primary channel. Each group chat runs in its own isolated Linux container.
| Attribute | Value |
|-----------|-------|
| **Language** | TypeScript (Node.js 20+) |
| **Codebase Size** | ~34.9k tokens (~17% of Claude context window) |
| **Config** | No config files — code changes only |
| **AI Runtime** | Claude Agent SDK (Claude Code) |
| **Primary Channel** | WhatsApp (Baileys) |
| **Isolation** | Apple Container (macOS) / Docker (Linux) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph WhatsApp["📱 WhatsApp"]
WA["WhatsApp Client\n(Baileys)"]
end
subgraph Core["🧠 Core Process (Single Node.js)"]
IDX["Orchestrator\n(index.ts)"]
DB["SQLite DB\n(db.ts)"]
GQ["Group Queue\n(group-queue.ts)"]
TS["Task Scheduler\n(task-scheduler.ts)"]
IPC["IPC Watcher\n(ipc.ts)"]
RT["Router\n(router.ts)"]
end
subgraph Containers["🐳 Isolated Containers"]
C1["Container 1\nGroup A\n(CLAUDE.md)"]
C2["Container 2\nGroup B\n(CLAUDE.md)"]
C3["Container 3\nMain Channel\n(CLAUDE.md)"]
end
subgraph Memory["💾 Per-Group Memory"]
M1["groups/A/CLAUDE.md"]
M2["groups/B/CLAUDE.md"]
M3["groups/main/CLAUDE.md"]
end
WA --> IDX
IDX --> DB
IDX --> GQ
GQ --> Containers
TS --> Containers
Containers --> IPC
IPC --> RT
RT --> WA
C1 --- M1
C2 --- M2
C3 --- M3
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant WA as WhatsApp (Baileys)
participant IDX as Orchestrator
participant DB as SQLite
participant GQ as Group Queue
participant Container as Container (Claude SDK)
participant IPC as IPC Watcher
User->>WA: Send message with @Andy
WA->>IDX: New message event
IDX->>DB: Store message
IDX->>GQ: Enqueue (per-group, concurrency limited)
GQ->>Container: Spawn Claude agent container
Note over Container: Mounts only group's filesystem
Note over Container: Reads group-specific CLAUDE.md
Container->>Container: Claude processes with tools
Container->>IPC: Write response to filesystem
IPC->>IDX: Detect new response file
IDX->>WA: Send reply
WA->>User: Display response
```
---
## Key Components
### 1. Orchestrator (`src/index.ts`)
The single entry point that manages:
- WhatsApp connection state
- Message polling loop
- Agent invocation decisions
- State management for groups and sessions
### 2. WhatsApp Channel (`src/channels/whatsapp.ts`)
- Uses **Baileys** library for WhatsApp Web connection
- Handles authentication via QR code scan
- Manages send/receive of messages
- Supports media messages
### 3. Container Runner (`src/container-runner.ts`)
The security core of NanoClaw:
- Spawns **streaming Claude Agent SDK** containers
- Each group runs in its own Linux container
- **Apple Container** on macOS, **Docker** on Linux
- Only explicitly mounted directories are accessible
- Bash commands run INSIDE the container, not on host
### 4. SQLite Database (`src/db.ts`)
- Stores messages, groups, sessions, and state
- Per-group message history
- Session continuity tracking
### 5. Group Queue (`src/group-queue.ts`)
- Per-group message queue
- Global concurrency limit
- Ensures one agent invocation per group at a time
### 6. IPC System (`src/ipc.ts`)
- Filesystem-based inter-process communication
- Container writes response to mounted directory
- IPC watcher detects and processes response files
- Handles task results from scheduled jobs
### 7. Task Scheduler (`src/task-scheduler.ts`)
- Recurring jobs that run Claude in containers
- Jobs can message the user back
- Managed from the main channel (self-chat)
### 8. Router (`src/router.ts`)
- Formats outbound messages
- Routes responses to correct WhatsApp recipient
### 9. Per-Group Memory (`groups/*/CLAUDE.md`)
- Each group has its own `CLAUDE.md` memory file
- Mounted into the group's container
- Complete filesystem isolation between groups
---
## Security Model
```mermaid
graph LR
subgraph Host["🖥️ Host System"]
NanoClaw["NanoClaw Process"]
end
subgraph Container1["🐳 Container (Group A)"]
Agent1["Claude Agent"]
FS1["Mounted: groups/A/"]
end
subgraph Container2["🐳 Container (Group B)"]
Agent2["Claude Agent"]
FS2["Mounted: groups/B/"]
end
NanoClaw -->|"Spawns"| Container1
NanoClaw -->|"Spawns"| Container2
style Container1 fill:#e8f5e9
style Container2 fill:#e8f5e9
```
- **OS-level isolation** vs. application-level permission checks
- Agents can only see what's explicitly mounted
- Bash commands run in container, not on host
- No shared memory between groups
---
## Philosophy & Design Decisions
1. **Small enough to understand** — Read the entire codebase in ~8 minutes
2. **Secure by isolation** — Linux containers, not permission checks
3. **Built for one user** — Not a framework, working software for personal use
4. **Customization = code changes** — No config sprawl, modify the code directly
5. **AI-native** — Claude Code handles setup (`/setup`), debugging, customization
6. **Skills over features** — Don't add features to codebase, add skills that transform forks
7. **Best harness, best model** — Claude Agent SDK gives Claude Code superpowers
---
## Agent Swarms (Unique Feature)
NanoClaw is the **first personal AI assistant** to support Agent Swarms:
- Spin up teams of specialized agents
- Agents collaborate within your chat
- Each agent runs in its own container
---
## Usage
```bash
# Setup (Claude Code handles everything)
git clone https://github.com/gavrielc/nanoclaw.git
cd nanoclaw
claude
# Then run /setup
# Talk to your assistant
@Andy send me a daily summary every morning at 9am
@Andy review the git history and update the README
```
Trigger word: `@Andy` (customizable via code changes)

291
docs/research/openclaw.md Normal file
View File

@@ -0,0 +1,291 @@
# 🦞 OpenClaw — Architecture & How It Works
> **Full-Featured Personal AI Assistant** — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.
## Overview
OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.
| Attribute | Value |
|-----------|-------|
| **Language** | TypeScript (Node.js ≥22) |
| **Codebase Size** | 430k+ lines, 50+ source modules |
| **Config** | `~/.openclaw/openclaw.json` (JSON5) |
| **AI Runtime** | Pi Agent (custom RPC), multi-model |
| **Channels** | 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.) |
| **Package Mgr** | pnpm (monorepo) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Messaging Channels (15+)"]
WA["WhatsApp\n(Baileys)"]
TG["Telegram\n(grammY)"]
SL["Slack\n(Bolt)"]
DC["Discord\n(discord.js)"]
GC["Google Chat"]
SIG["Signal\n(signal-cli)"]
BB["BlueBubbles\n(iMessage)"]
IM["iMessage\n(legacy)"]
MST["MS Teams"]
MTX["Matrix"]
ZL["Zalo"]
WC["WebChat"]
end
subgraph Gateway["🌐 Gateway (Control Plane)"]
WS["WebSocket Server\nws://127.0.0.1:18789"]
SES["Session Manager"]
RTE["Channel Router"]
PRES["Presence System"]
Q["Message Queue"]
CFG["Config Manager"]
AUTH["Auth / Pairing"]
end
subgraph Agent["🧠 Pi Agent (RPC)"]
AGENT["Agent Runtime"]
TOOLS["Tool Registry"]
STREAM["Block Streaming"]
PROV["Provider Router\n(multi-model)"]
end
subgraph Apps["📲 Companion Apps"]
MAC["macOS Menu Bar"]
IOS["iOS Node"]
ANDR["Android Node"]
end
subgraph ToolSet["🔧 Tools & Automation"]
BROWSER["Browser Control\n(CDP/Chromium)"]
CANVAS["Live Canvas\n(A2UI)"]
CRON["Cron Jobs"]
WEBHOOK["Webhooks"]
GMAIL["Gmail Pub/Sub"]
NODES["Nodes\n(camera, screen, location)"]
SKILLS_T["Skills Platform"]
SESS_T["Session Tools\n(agent-to-agent)"]
end
subgraph Workspace["💾 Workspace"]
AGENTS_MD["AGENTS.md"]
SOUL_MD["SOUL.md"]
USER_MD["USER.md"]
TOOLS_MD["TOOLS.md"]
SKILLS_W["Skills/"]
end
Channels --> Gateway
Apps --> Gateway
Gateway --> Agent
Agent --> ToolSet
Agent --> Workspace
Agent --> PROV
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Channel (WA/TG/Slack/etc.)
participant GW as Gateway (WS)
participant Session as Session Manager
participant Agent as Pi Agent (RPC)
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward via channel adapter
GW->>Session: Route to session (main/group)
GW->>GW: Check auth (pairing/allowlist)
Session->>Agent: Invoke agent (RPC)
Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
Agent->>LLM: Stream request (with tool definitions)
loop Tool Use Loop
LLM-->>Agent: Tool call (block stream)
Agent->>Tools: Execute tool
Tools-->>Agent: Tool result
Agent->>LLM: Continue with result
end
LLM-->>Agent: Final response (block stream)
Agent-->>Session: Return response
Session->>GW: Add to outbound queue
GW->>GW: Chunk if needed (per-channel limits)
GW->>Channel: Send chunked replies
Channel->>User: Display response
Note over GW: Typing indicators, presence updates
```
---
## Key Components
### 1. Gateway (`src/gateway/`)
The central control plane — everything connects through it:
- **WebSocket server** on `ws://127.0.0.1:18789`
- Session management (main, group, per-channel)
- Multi-agent routing (different agents for different channels)
- Presence tracking and typing indicators
- Config management and hot-reload
- Health checks, doctor diagnostics
### 2. Pi Agent (`src/agents/`)
Custom RPC-based agent runtime:
- Tool streaming and block streaming
- Multi-model support with failover
- Session pruning for long conversations
- Usage tracking (tokens, cost)
- Thinking level control (off → xhigh)
### 3. Channel System (`src/channels/` + per-channel dirs)
15+ channel adapters, each with:
- Auth handling (pairing codes, allowlists, OAuth)
- Message format conversion
- Media pipeline (images, audio, video)
- Group routing with mention gating
- Per-channel chunking (character limits differ)
### 4. Security System (`src/security/`)
Multi-layered security:
- **DM Pairing** — unknown senders get a pairing code, must be approved
- **Allowlists** — per-channel user whitelists
- **Docker Sandbox** — non-main sessions can run in per-session Docker containers
- **Tool denylist** — block dangerous tools in sandbox mode
- **Elevated bash** — per-session toggle for host-level access
### 5. Browser Control (`src/browser/`)
- Dedicated OpenClaw-managed Chrome/Chromium instance
- CDP (Chrome DevTools Protocol) control
- Snapshots, actions, uploads, profiles
- Full web automation capabilities
### 6. Canvas & A2UI (`src/canvas-host/`)
- Agent-driven visual workspace
- A2UI (Agent-to-UI) — push HTML/JS to canvas
- Canvas eval, snapshot, reset
- Available on macOS, iOS, Android
### 7. Voice System
- **Voice Wake** — always-on speech detection
- **Talk Mode** — continuous conversation overlay
- ElevenLabs TTS integration
- Available on macOS, iOS, Android
### 8. Companion Apps
- **macOS app**: Menu bar, Voice Wake/PTT, WebChat, debug tools
- **iOS node**: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
- **Android node**: Canvas, Talk Mode, camera, screen recording, SMS
### 9. Session Tools (Agent-to-Agent)
- `sessions_list` — discover active sessions
- `sessions_history` — fetch transcript logs
- `sessions_send` — message another session with reply-back
### 10. Skills Platform (`src/plugins/`, `skills/`)
- **Bundled skills** — pre-installed capabilities
- **Managed skills** — installed from ClawHub registry
- **Workspace skills** — user-created in workspace
- Install gating and UI
- ClawHub registry for community skills
### 11. Automation
- **Cron jobs** — scheduled recurring tasks
- **Webhooks** — external trigger surface
- **Gmail Pub/Sub** — email-triggered actions
### 12. Ops & Deployment
- Docker support with compose
- Tailscale Serve/Funnel for remote access
- SSH tunnels with token/password auth
- `openclaw doctor` for diagnostics
- Nix mode for declarative config
---
## Project Structure (Simplified)
```
openclaw/
├── src/
│ ├── agents/ # Pi agent runtime
│ ├── gateway/ # WebSocket gateway
│ ├── channels/ # Channel adapter base
│ ├── whatsapp/ # WhatsApp adapter
│ ├── telegram/ # Telegram adapter
│ ├── slack/ # Slack adapter
│ ├── discord/ # Discord adapter
│ ├── signal/ # Signal adapter
│ ├── imessage/ # iMessage adapters
│ ├── browser/ # Browser control (CDP)
│ ├── canvas-host/ # Canvas & A2UI
│ ├── sessions/ # Session management
│ ├── routing/ # Message routing
│ ├── security/ # Auth, pairing, sandbox
│ ├── cron/ # Scheduled jobs
│ ├── memory/ # Memory system
│ ├── providers/ # LLM providers
│ ├── plugins/ # Plugin/skill system
│ ├── media/ # Media pipeline
│ ├── tts/ # Text-to-speech
│ ├── web/ # Control UI + WebChat
│ ├── wizard/ # Onboarding wizard
│ └── cli/ # CLI commands
├── apps/ # Companion app sources
├── packages/ # Shared packages
├── extensions/ # Extension channels
├── skills/ # Bundled skills
├── ui/ # Web UI source
└── Swabble/ # macOS/iOS Swift source
```
---
## CLI Commands
| Command | Description |
|---------|-------------|
| `openclaw onboard` | Guided setup wizard |
| `openclaw gateway` | Start the gateway |
| `openclaw agent --message "..."` | Chat with agent |
| `openclaw message send` | Send to any channel |
| `openclaw doctor` | Diagnostics & migration |
| `openclaw pairing approve` | Approve DM pairing |
| `openclaw update` | Update to latest version |
| `openclaw channels login` | Link WhatsApp |
---
## Chat Commands (In-Channel)
| Command | Description |
|---------|-------------|
| `/status` | Session status (model, tokens, cost) |
| `/new` / `/reset` | Reset session |
| `/compact` | Compact session context |
| `/think <level>` | Set thinking level |
| `/verbose on\|off` | Toggle verbose mode |
| `/usage off\|tokens\|full` | Usage footer |
| `/restart` | Restart gateway |
| `/activation mention\|always` | Group activation mode |
---
## Key Design Decisions
1. **Gateway as control plane** — Single WebSocket server everything connects to
2. **Multi-agent routing** — Different agents for different channels/groups
3. **Pairing-based security** — Unknown DMs get pairing codes by default
4. **Docker sandboxing** — Non-main sessions can be isolated
5. **Block streaming** — Responses streamed as structured blocks
6. **Extension-based channels** — MS Teams, Matrix, Zalo are extensions
7. **Companion apps** — Native macOS/iOS/Android for device-level features
8. **ClawHub** — Community skill registry

View File

@@ -0,0 +1,53 @@
Looking at the OpenCode CLI doc against Aetheel's opencode_runtime.py, here are the gaps:
What Aetheel uses today:
opencode run with --model, --continue, --session, --format
SDK mode via opencode serve API (session create + chat)
Session persistence in SQLite
System prompt injection via XML tags (CLI) or system param (SDK)
Rate limit detection from error text
Live session tracking with idle timeout
What Aetheel is missing from the OpenCode CLI:
[-] --agent flag — OpenCode supports custom agents (opencode agent create/list). Aetheel has no concept of selecting different OpenCode agents per request. This would be useful for the planned agent teams feature — you could have a "programmer" agent and a "researcher" agent defined in OpenCode.
[-] --file / -f flag — OpenCode can attach files to a prompt (opencode run -f image.png "describe this"). Aetheel doesn't pass file attachments from chat adapters through to the runtime. Discord/Telegram/Slack all support file uploads.
[-] --attach flag — You can run opencode run --attach http://localhost:4096 to connect to a running server, avoiding MCP cold boot on every request. Aetheel's SDK mode connects to the server, but CLI mode spawns a fresh process each time. Using --attach in CLI mode would give you the speed of SDK mode without needing the Python SDK.
[-] --fork flag — Fork a session when continuing, creating a branch. Aetheel always continues sessions linearly. Forking would be useful for "what if" scenarios or spawning subagent tasks from a shared context.
[-] --title flag — Name sessions for easier identification. Aetheel's sessions are tracked by conversation ID but have no human-readable title.
--share flag — Share sessions via URL. Aetheel has no session sharing.
opencode session list/export/import — Full session management. Aetheel can list sessions internally but doesn't expose export/import or the full session lifecycle.
[-] opencode stats — Token usage and cost statistics with --days, --tools, --models filters. Aetheel tracks basic usage stats in memory but doesn't query OpenCode's built-in stats.
[-] opencode models — List available models from configured providers. Aetheel has no way to discover available models — you have to know the model name.
opencode auth management — Login/logout/list for providers. Aetheel relies on env vars for auth and has no way to manage OpenCode's credential store.
opencode mcp auth/logout/debug — OAuth-based MCP server auth and debugging. Aetheel can add/remove MCP servers but can't handle OAuth flows or debug MCP connections.
opencode github agent — GitHub Actions integration for repo automation. Aetheel has no CI/CD agent support.
opencode web — Built-in web UI. Aetheel has its own WebChat but doesn't leverage OpenCode's web interface.
opencode acp — Agent Client Protocol server. Aetheel doesn't use ACP.
OPENCODE_AUTO_SHARE — Auto-share sessions.
OPENCODE_DISABLE_AUTOCOMPACT — Control context compaction. Aetheel doesn't expose this, which could matter for long conversations.
OPENCODE_EXPERIMENTAL_PLAN_MODE — Plan mode for structured task execution. Aetheel doesn't use this.
OPENCODE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS — Control bash command timeouts. Aetheel doesn't pass this through.
OPENCODE_ENABLE_EXA — Exa web search tools. Aetheel doesn't expose this toggle.
opencode upgrade — Self-update. Aetheel has aetheel update which does git pull but doesn't update the OpenCode binary itself.
The most impactful gaps are --agent (for agent teams), --file (for media from chat), --attach (for faster CLI mode), --fork (for branching conversations), and opencode stats (for usage visibility).

View File

@@ -0,0 +1,437 @@
CLI
OpenCode CLI options and commands.
The OpenCode CLI by default starts the TUI when run without any arguments.
Terminal window
opencode
But it also accepts commands as documented on this page. This allows you to interact with OpenCode programmatically.
Terminal window
opencode run "Explain how closures work in JavaScript"
tui
Start the OpenCode terminal user interface.
Terminal window
opencode [project]
Flags
Flag Short Description
--continue -c Continue the last session
--session -s Session ID to continue
--fork Fork the session when continuing (use with --continue or --session)
--prompt Prompt to use
--model -m Model to use in the form of provider/model
--agent Agent to use
--port Port to listen on
--hostname Hostname to listen on
Commands
The OpenCode CLI also has the following commands.
agent
Manage agents for OpenCode.
Terminal window
opencode agent [command]
attach
Attach a terminal to an already running OpenCode backend server started via serve or web commands.
Terminal window
opencode attach [url]
This allows using the TUI with a remote OpenCode backend. For example:
Terminal window
# Start the backend server for web/mobile access
opencode web --port 4096 --hostname 0.0.0.0
# In another terminal, attach the TUI to the running backend
opencode attach http://10.20.30.40:4096
Flags
Flag Short Description
--dir Working directory to start TUI in
--session -s Session ID to continue
create
Create a new agent with custom configuration.
Terminal window
opencode agent create
This command will guide you through creating a new agent with a custom system prompt and tool configuration.
list
List all available agents.
Terminal window
opencode agent list
auth
Command to manage credentials and login for providers.
Terminal window
opencode auth [command]
login
OpenCode is powered by the provider list at Models.dev, so you can use opencode auth login to configure API keys for any provider youd like to use. This is stored in ~/.local/share/opencode/auth.json.
Terminal window
opencode auth login
When OpenCode starts up it loads the providers from the credentials file. And if there are any keys defined in your environments or a .env file in your project.
list
Lists all the authenticated providers as stored in the credentials file.
Terminal window
opencode auth list
Or the short version.
Terminal window
opencode auth ls
logout
Logs you out of a provider by clearing it from the credentials file.
Terminal window
opencode auth logout
github
Manage the GitHub agent for repository automation.
Terminal window
opencode github [command]
install
Install the GitHub agent in your repository.
Terminal window
opencode github install
This sets up the necessary GitHub Actions workflow and guides you through the configuration process. Learn more.
run
Run the GitHub agent. This is typically used in GitHub Actions.
Terminal window
opencode github run
Flags
Flag Description
--event GitHub mock event to run the agent for
--token GitHub personal access token
mcp
Manage Model Context Protocol servers.
Terminal window
opencode mcp [command]
add
Add an MCP server to your configuration.
Terminal window
opencode mcp add
This command will guide you through adding either a local or remote MCP server.
list
List all configured MCP servers and their connection status.
Terminal window
opencode mcp list
Or use the short version.
Terminal window
opencode mcp ls
auth
Authenticate with an OAuth-enabled MCP server.
Terminal window
opencode mcp auth [name]
If you dont provide a server name, youll be prompted to select from available OAuth-capable servers.
You can also list OAuth-capable servers and their authentication status.
Terminal window
opencode mcp auth list
Or use the short version.
Terminal window
opencode mcp auth ls
logout
Remove OAuth credentials for an MCP server.
Terminal window
opencode mcp logout [name]
debug
Debug OAuth connection issues for an MCP server.
Terminal window
opencode mcp debug <name>
models
List all available models from configured providers.
Terminal window
opencode models [provider]
This command displays all models available across your configured providers in the format provider/model.
This is useful for figuring out the exact model name to use in your config.
You can optionally pass a provider ID to filter models by that provider.
Terminal window
opencode models anthropic
Flags
Flag Description
--refresh Refresh the models cache from models.dev
--verbose Use more verbose model output (includes metadata like costs)
Use the --refresh flag to update the cached model list. This is useful when new models have been added to a provider and you want to see them in OpenCode.
Terminal window
opencode models --refresh
run
Run opencode in non-interactive mode by passing a prompt directly.
Terminal window
opencode run [message..]
This is useful for scripting, automation, or when you want a quick answer without launching the full TUI. For example.
Terminal window
opencode run Explain the use of context in Go
You can also attach to a running opencode serve instance to avoid MCP server cold boot times on every run:
Terminal window
# Start a headless server in one terminal
opencode serve
# In another terminal, run commands that attach to it
opencode run --attach http://localhost:4096 "Explain async/await in JavaScript"
Flags
Flag Short Description
--command The command to run, use message for args
--continue -c Continue the last session
--session -s Session ID to continue
--fork Fork the session when continuing (use with --continue or --session)
--share Share the session
--model -m Model to use in the form of provider/model
--agent Agent to use
--file -f File(s) to attach to message
--format Format: default (formatted) or json (raw JSON events)
--title Title for the session (uses truncated prompt if no value provided)
--attach Attach to a running opencode server (e.g., http://localhost:4096)
--port Port for the local server (defaults to random port)
serve
Start a headless OpenCode server for API access. Check out the server docs for the full HTTP interface.
Terminal window
opencode serve
This starts an HTTP server that provides API access to opencode functionality without the TUI interface. Set OPENCODE_SERVER_PASSWORD to enable HTTP basic auth (username defaults to opencode).
Flags
Flag Description
--port Port to listen on
--hostname Hostname to listen on
--mdns Enable mDNS discovery
--cors Additional browser origin(s) to allow CORS
session
Manage OpenCode sessions.
Terminal window
opencode session [command]
list
List all OpenCode sessions.
Terminal window
opencode session list
Flags
Flag Short Description
--max-count -n Limit to N most recent sessions
--format Output format: table or json (table)
stats
Show token usage and cost statistics for your OpenCode sessions.
Terminal window
opencode stats
Flags
Flag Description
--days Show stats for the last N days (all time)
--tools Number of tools to show (all)
--models Show model usage breakdown (hidden by default). Pass a number to show top N
--project Filter by project (all projects, empty string: current project)
export
Export session data as JSON.
Terminal window
opencode export [sessionID]
If you dont provide a session ID, youll be prompted to select from available sessions.
import
Import session data from a JSON file or OpenCode share URL.
Terminal window
opencode import <file>
You can import from a local file or an OpenCode share URL.
Terminal window
opencode import session.json
opencode import https://opncd.ai/s/abc123
web
Start a headless OpenCode server with a web interface.
Terminal window
opencode web
This starts an HTTP server and opens a web browser to access OpenCode through a web interface. Set OPENCODE_SERVER_PASSWORD to enable HTTP basic auth (username defaults to opencode).
Flags
Flag Description
--port Port to listen on
--hostname Hostname to listen on
--mdns Enable mDNS discovery
--cors Additional browser origin(s) to allow CORS
acp
Start an ACP (Agent Client Protocol) server.
Terminal window
opencode acp
This command starts an ACP server that communicates via stdin/stdout using nd-JSON.
Flags
Flag Description
--cwd Working directory
--port Port to listen on
--hostname Hostname to listen on
uninstall
Uninstall OpenCode and remove all related files.
Terminal window
opencode uninstall
Flags
Flag Short Description
--keep-config -c Keep configuration files
--keep-data -d Keep session data and snapshots
--dry-run Show what would be removed without removing
--force -f Skip confirmation prompts
upgrade
Updates opencode to the latest version or a specific version.
Terminal window
opencode upgrade [target]
To upgrade to the latest version.
Terminal window
opencode upgrade
To upgrade to a specific version.
Terminal window
opencode upgrade v0.1.48
Flags
Flag Short Description
--method -m The installation method that was used; curl, npm, pnpm, bun, brew
Global Flags
The opencode CLI takes the following global flags.
Flag Short Description
--help -h Display help
--version -v Print version number
--print-logs Print logs to stderr
--log-level Log level (DEBUG, INFO, WARN, ERROR)
Environment variables
OpenCode can be configured using environment variables.
Variable Type Description
OPENCODE_AUTO_SHARE boolean Automatically share sessions
OPENCODE_GIT_BASH_PATH string Path to Git Bash executable on Windows
OPENCODE_CONFIG string Path to config file
OPENCODE_CONFIG_DIR string Path to config directory
OPENCODE_CONFIG_CONTENT string Inline json config content
OPENCODE_DISABLE_AUTOUPDATE boolean Disable automatic update checks
OPENCODE_DISABLE_PRUNE boolean Disable pruning of old data
OPENCODE_DISABLE_TERMINAL_TITLE boolean Disable automatic terminal title updates
OPENCODE_PERMISSION string Inlined json permissions config
OPENCODE_DISABLE_DEFAULT_PLUGINS boolean Disable default plugins
OPENCODE_DISABLE_LSP_DOWNLOAD boolean Disable automatic LSP server downloads
OPENCODE_ENABLE_EXPERIMENTAL_MODELS boolean Enable experimental models
OPENCODE_DISABLE_AUTOCOMPACT boolean Disable automatic context compaction
OPENCODE_DISABLE_CLAUDE_CODE boolean Disable reading from .claude (prompt + skills)
OPENCODE_DISABLE_CLAUDE_CODE_PROMPT boolean Disable reading ~/.claude/CLAUDE.md
OPENCODE_DISABLE_CLAUDE_CODE_SKILLS boolean Disable loading .claude/skills
OPENCODE_DISABLE_MODELS_FETCH boolean Disable fetching models from remote sources
OPENCODE_FAKE_VCS string Fake VCS provider for testing purposes
OPENCODE_DISABLE_FILETIME_CHECK boolean Disable file time checking for optimization
OPENCODE_CLIENT string Client identifier (defaults to cli)
OPENCODE_ENABLE_EXA boolean Enable Exa web search tools
OPENCODE_SERVER_PASSWORD string Enable basic auth for serve/web
OPENCODE_SERVER_USERNAME string Override basic auth username (default opencode)
OPENCODE_MODELS_URL string Custom URL for fetching models configuration
Experimental
These environment variables enable experimental features that may change or be removed.
Variable Type Description
OPENCODE_EXPERIMENTAL boolean Enable all experimental features
OPENCODE_EXPERIMENTAL_ICON_DISCOVERY boolean Enable icon discovery
OPENCODE_EXPERIMENTAL_DISABLE_COPY_ON_SELECT boolean Disable copy on select in TUI
OPENCODE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS number Default timeout for bash commands in ms
OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX number Max output tokens for LLM responses
OPENCODE_EXPERIMENTAL_FILEWATCHER boolean Enable file watcher for entire dir
OPENCODE_EXPERIMENTAL_OXFMT boolean Enable oxfmt formatter
OPENCODE_EXPERIMENTAL_LSP_TOOL boolean Enable experimental LSP tool
OPENCODE_EXPERIMENTAL_DISABLE_FILEWATCHER boolean Disable file watcher
OPENCODE_EXPERIMENTAL_EXA boolean Enable experimental Exa features
OPENCODE_EXPERIMENTAL_LSP_TY boolean Enable experimental LSP type checking
OPENCODE_EXPERIMENTAL_MARKDOWN boolean Enable experimental markdown features
OPENCODE_EXPERIMENTAL_PLAN_MODE boolean Enable plan mode

251
docs/research/picoclaw.md Normal file
View File

@@ -0,0 +1,251 @@
# 🦐 PicoClaw — Architecture & How It Works
> **Ultra-Efficient AI Assistant in Go** — $10 hardware, 10MB RAM, 1s boot time.
## Overview
PicoClaw is an extreme-lightweight rewrite of Nanobot in Go, designed to run on the cheapest possible hardware — including $10 RISC-V SBCs with <10MB RAM. The entire project was AI-bootstrapped (95% agent-generated) through a self-bootstrapping migration from Python to Go.
| Attribute | Value |
|-----------|-------|
| **Language** | Go 1.21+ |
| **RAM Usage** | <10MB |
| **Startup Time** | <1s (even at 0.6GHz) |
| **Hardware Cost** | As low as $10 |
| **Architectures** | x86_64, ARM64, RISC-V |
| **Binary** | Single self-contained binary |
| **Config** | `~/.picoclaw/config.json` |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Chat Channels"]
TG["Telegram"]
DC["Discord"]
QQ["QQ"]
DT["DingTalk"]
LINE["LINE"]
end
subgraph Core["🧠 Core Agent (Single Binary)"]
MAIN["Main Entry\n(cmd/)"]
AGENT["Agent Loop\n(pkg/agent/)"]
CONF["Config\n(pkg/config/)"]
AUTH["Auth\n(pkg/auth/)"]
PROV["Providers\n(pkg/providers/)"]
TOOLS["Tools\n(pkg/tools/)"]
end
subgraph ToolSet["🔧 Built-in Tools"]
SHELL["Shell Exec"]
FILE["File R/W"]
WEB["Web Search\n(Brave / DuckDuckGo)"]
CRON_T["Cron / Reminders"]
SPAWN["Spawn Subagent"]
MSG["Message Tool"]
end
subgraph Workspace["💾 Workspace"]
AGENTS_MD["AGENTS.md"]
SOUL_MD["SOUL.md"]
TOOLS_MD["TOOLS.md"]
USER_MD["USER.md"]
IDENTITY["IDENTITY.md"]
HB["HEARTBEAT.md"]
MEM["MEMORY.md"]
SESSIONS["sessions/"]
SKILLS["skills/"]
end
subgraph Providers["☁️ LLM Providers"]
GEMINI["Gemini"]
ZHIPU["Zhipu"]
OR["OpenRouter"]
OA["OpenAI"]
AN["Anthropic"]
DS["DeepSeek"]
GROQ["Groq\n(+ voice)"]
end
Channels --> Core
AGENT --> ToolSet
AGENT --> Workspace
AGENT --> Providers
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Chat Channel
participant GW as Gateway
participant Agent as Agent Loop
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward message
GW->>Agent: Route to agent
Agent->>Agent: Load context (AGENTS.md, SOUL.md, USER.md)
Agent->>LLM: Send prompt + tool defs
LLM-->>Agent: Response
alt Tool Call
Agent->>Tools: Execute tool
Tools-->>Agent: Result
Agent->>LLM: Continue
LLM-->>Agent: Next response
end
Agent->>Agent: Update memory/session
Agent-->>GW: Return response
GW-->>Channel: Send reply
Channel-->>User: Display
```
---
## Heartbeat System Flow
```mermaid
sequenceDiagram
participant Timer as Heartbeat Timer
participant Agent as Agent
participant HB as HEARTBEAT.md
participant Subagent as Spawn Subagent
participant User
Timer->>Agent: Trigger (every 30 min)
Agent->>HB: Read periodic tasks
alt Quick Task
Agent->>Agent: Execute directly
Agent-->>Timer: HEARTBEAT_OK
end
alt Long Task
Agent->>Subagent: Spawn async subagent
Agent-->>Timer: Continue to next task
Subagent->>Subagent: Work independently
Subagent->>User: Send result via message tool
end
```
---
## Key Components
### 1. Agent Loop (`pkg/agent/`)
Go-native implementation of the LLM ↔ tool execution loop:
- Builds context from workspace identity files
- Sends to LLM provider with tool definitions
- Iterates on tool calls up to `max_tool_iterations` (default: 20)
- Session history managed in `workspace/sessions/`
### 2. Provider System (`pkg/providers/`)
- Gemini and Zhipu are fully tested
- OpenRouter, Anthropic, OpenAI, DeepSeek marked "to be tested"
- Groq for voice transcription (Whisper)
- Each provider implements a common interface
### 3. Tool System (`pkg/tools/`)
Built-in tools:
- **read_file** / **write_file** / **list_dir** / **edit_file** / **append_file** — File operations
- **exec** — Shell command execution (with safety guards)
- **web_search** — Brave Search or DuckDuckGo fallback
- **cron** — Scheduled reminders and recurring tasks
- **spawn** — Create async subagents
- **message** — Subagent-to-user communication
### 4. Security Sandbox
```mermaid
graph TD
RW["restrict_to_workspace = true"]
RW --> RF["read_file: workspace only"]
RW --> WF["write_file: workspace only"]
RW --> LD["list_dir: workspace only"]
RW --> EF["edit_file: workspace only"]
RW --> AF["append_file: workspace only"]
RW --> EX["exec: workspace paths only"]
EX --> BL["ALWAYS Blocked:"]
BL --> RM["rm -rf"]
BL --> FMT["format, mkfs"]
BL --> DD["dd if="]
BL --> SHUT["shutdown, reboot"]
BL --> FORK["fork bomb"]
```
- Workspace sandbox enabled by default
- All tools restricted to workspace directory
- Dangerous commands always blocked (even with sandbox off)
- Consistent across main agent, subagents, and heartbeat tasks
### 5. Heartbeat System
- Reads `HEARTBEAT.md` every 30 minutes
- Quick tasks executed directly
- Long tasks spawned as async subagents
- Subagents communicate independently via message tool
### 6. Channel System
- **Telegram** — Easy setup (token only)
- **Discord** — Bot token + intents
- **QQ** — AppID + AppSecret
- **DingTalk** — Client credentials
- **LINE** — Credentials + webhook URL (HTTPS required)
### 7. Workspace Layout
```
~/.picoclaw/workspace/
├── sessions/ # Conversation history
├── memory/ # Long-term memory (MEMORY.md)
├── state/ # Persistent state
├── cron/ # Scheduled jobs database
├── skills/ # Custom skills
├── AGENTS.md # Agent behavior guide
├── HEARTBEAT.md # Periodic task prompts
├── IDENTITY.md # Agent identity
├── SOUL.md # Agent soul
├── TOOLS.md # Tool descriptions
└── USER.md # User preferences
```
---
## Comparison Table (from README)
| | OpenClaw | NanoBot | **PicoClaw** |
|---------------------|------------|-------------|-----------------------|
| **Language** | TypeScript | Python | **Go** |
| **RAM** | >1GB | >100MB | **<10MB** |
| **Startup (0.8GHz)**| >500s | >30s | **<1s** |
| **Cost** | Mac $599 | SBC ~$50 | **Any Linux, ~$10** |
---
## Deployment Targets
PicoClaw can run on almost any Linux device:
- **$9.9** LicheeRV-Nano — Minimal home assistant
- **$30-50** NanoKVM — Automated server maintenance
- **$50-100** MaixCAM — Smart monitoring
---
## Key Design Decisions
1. **Go for minimal footprint** — Single binary, no runtime deps, tiny memory
2. **AI-bootstrapped migration** — 95% of Go code generated by the AI agent itself
3. **Web search with fallback** — Brave Search primary, DuckDuckGo fallback (free)
4. **Heartbeat for proactive tasks** — Agent checks `HEARTBEAT.md` periodically
5. **Subagent pattern** — Long tasks run async, don't block heartbeat
6. **Default sandbox**`restrict_to_workspace: true` by default
7. **Cross-architecture** — Single binary compiles for x86, ARM64, RISC-V