feat: openclaw-style secrets (env.vars + \) and per-task model routing
- Replace python-dotenv with config.json env.vars block + \ substitution - Add models section for per-task model routing (heartbeat, subagent, default) - Heartbeat/subagent tasks can use different models/providers than main chat - Remove python-dotenv from dependencies - Update all docs to reflect new config approach - Reorganize docs into project/ and research/ subdirectories
This commit is contained in:
237
docs/research/Openclaw deep dive.md
Normal file
237
docs/research/Openclaw deep dive.md
Normal file
@@ -0,0 +1,237 @@
|
||||
|
||||
# OpenClaw Architecture Deep Dive
|
||||
|
||||
## What is OpenClaw?
|
||||
|
||||
OpenClaw is an open source AI assistant created by Peter Steinberger (founder of PSP PDF kit) that gained 100,000 GitHub stars in 3 days - one of the fastest growing repositories in GitHub history.
|
||||
|
||||
**Technical Definition:** An agent runtime with a gateway in front of it.
|
||||
|
||||
Despite viral stories of agents calling owners at 3am, texting people's wives autonomously, and browsing Twitter overnight, OpenClaw isn't sentient. It's elegant event-driven engineering.
|
||||
|
||||
## Core Architecture
|
||||
|
||||
### The Gateway
|
||||
- Long-running process on your machine
|
||||
- Constantly accepts connections from messaging apps (WhatsApp, Telegram, Discord, iMessage, Slack)
|
||||
- Routes messages to AI agents
|
||||
- **Doesn't think, reason, or decide** - only accepts inputs and routes them
|
||||
|
||||
### The Agent Runtime
|
||||
- Processes events from the queue
|
||||
- Executes actions using available tools
|
||||
- Has deep system access: shell commands, file operations, browser control
|
||||
|
||||
### State Persistence
|
||||
- Memory stored as local markdown files
|
||||
- Includes preferences, conversation history, context from previous sessions
|
||||
- Agent "remembers" by reading these files on each wake-up
|
||||
- Not real-time learning - just file reading
|
||||
|
||||
### The Event Loop
|
||||
All events enter a queue → Queue gets processed → Agents execute → State persists → Loop continues
|
||||
|
||||
## The Five Input Types
|
||||
|
||||
### 1. Messages (Human Input)
|
||||
**How it works:**
|
||||
- You send text via WhatsApp, iMessage, or Slack
|
||||
- Gateway receives and routes to agent
|
||||
- Agent responds
|
||||
|
||||
**Key details:**
|
||||
- Sessions are per-channel (WhatsApp and Slack are separate contexts)
|
||||
- Multiple requests queue up and process in order
|
||||
- No jumbled responses - finishes one thought before moving to next
|
||||
|
||||
### 2. Heartbeats (Timer Events)
|
||||
**How it works:**
|
||||
- Timer fires at regular intervals (default: every 30 minutes)
|
||||
- Gateway schedules an agent turn with a preconfigured prompt
|
||||
- Agent responds to instructions like "Check inbox for urgent items" or "Review calendar"
|
||||
|
||||
**Key details:**
|
||||
- Configurable interval, prompt, and active hours
|
||||
- If nothing urgent: agent returns `heartbeat_okay` token (suppressed from user)
|
||||
- If something urgent: you get a ping
|
||||
- **This is the secret sauce** - makes OpenClaw feel proactive
|
||||
|
||||
**Example prompts:**
|
||||
- "Check my inbox for anything urgent"
|
||||
- "Review my calendar"
|
||||
- "Look for overdue tasks"
|
||||
|
||||
### 3. Cron Jobs (Scheduled Events)
|
||||
**How it works:**
|
||||
- More control than heartbeats
|
||||
- Specify exact timing and custom instructions
|
||||
- When time hits, event fires and prompt sent to agent
|
||||
|
||||
**Examples:**
|
||||
- 9am daily: "Check email and flag anything urgent"
|
||||
- Every Monday 3pm: "Review calendar for the week and remind me of conflicts"
|
||||
- Midnight: "Browse my Twitter feed and save interesting posts"
|
||||
- 8am: "Text wife good morning"
|
||||
- 10pm: "Text wife good night"
|
||||
|
||||
**Real example:** The viral story of agent texting someone's wife was just cron jobs firing at scheduled times. Agent wasn't deciding - it was responding to scheduled prompts.
|
||||
|
||||
### 4. Hooks (Internal State Changes)
|
||||
**How it works:**
|
||||
- System itself triggers these events
|
||||
- Event-driven development pattern
|
||||
|
||||
**Types:**
|
||||
- Gateway startup → fires hook
|
||||
- Agent begins task → fires hook
|
||||
- Stop command issued → fires hook
|
||||
|
||||
**Purpose:**
|
||||
- Save memory on reset
|
||||
- Run setup instructions on startup
|
||||
- Modify context before agent runs
|
||||
- Self-management
|
||||
|
||||
### 5. Webhooks (External System Events)
|
||||
**How it works:**
|
||||
- External systems notify OpenClaw of events
|
||||
- Agent responds to entire digital life
|
||||
|
||||
**Examples:**
|
||||
- Email hits inbox → webhook fires → agent processes
|
||||
- Slack reaction → webhook fires → agent responds
|
||||
- Jira ticket created → webhook fires → agent researches
|
||||
- GitHub event → webhook fires → agent acts
|
||||
- Calendar event approaches → webhook fires → agent reminds
|
||||
|
||||
**Supported integrations:** Slack, Discord, GitHub, and basically anything with webhook support
|
||||
|
||||
### Bonus: Agent-to-Agent Messaging
|
||||
**How it works:**
|
||||
- Multi-agent setups with isolated workspaces
|
||||
- Agents pass messages between each other
|
||||
- Each agent has different profile/specialization
|
||||
|
||||
**Example:**
|
||||
- Research Agent finishes gathering info
|
||||
- Queues up work for Writing Agent
|
||||
- Writing Agent processes and produces output
|
||||
|
||||
**Reality:** Looks like collaboration, but it's just messages entering queues
|
||||
|
||||
## Why It Feels Alive
|
||||
|
||||
The combination creates an illusion of autonomy:
|
||||
|
||||
**Time** (heartbeats, crons) → **Events** → **Queue** → **Agent Execution** → **State Persistence** → **Loop**
|
||||
|
||||
### The 3am Phone Call Example
|
||||
|
||||
**What it looked like:**
|
||||
- Agent autonomously decided to get phone number
|
||||
- Agent decided to call owner
|
||||
- Agent waited until 3am to execute
|
||||
|
||||
**What actually happened:**
|
||||
1. Some event fired (cron or heartbeat) - exact configuration unknown
|
||||
2. Event entered queue
|
||||
3. Agent processed with available tools and instructions
|
||||
4. Agent acquired Twilio phone number
|
||||
5. Agent made the call
|
||||
6. Owner didn't ask in the moment, but behavior was enabled in setup
|
||||
|
||||
**Key insight:** Nothing was thinking overnight. Nothing was deciding. Time produced event → Event kicked off agent → Agent followed instructions.
|
||||
|
||||
## The Complete Event Flow
|
||||
|
||||
**Event Sources:**
|
||||
- Time creates events (heartbeats, crons)
|
||||
- Humans create events (messages)
|
||||
- External systems create events (webhooks)
|
||||
- Internal state creates events (hooks)
|
||||
- Agents create events for other agents
|
||||
|
||||
**Processing:**
|
||||
All events → Enter queue → Queue processed → Agents execute → State persists → Loop continues
|
||||
|
||||
**Memory:**
|
||||
- Stored in local markdown files
|
||||
- Agent reads on wake-up
|
||||
- Remembers previous conversations
|
||||
- Not learning - just reading files you could open in text editor
|
||||
|
||||
## Security Concerns
|
||||
|
||||
### The Analysis
|
||||
Cisco's security team analyzed OpenClaw ecosystem:
|
||||
- 31,000 available skills examined
|
||||
- 26% contain at least one vulnerability
|
||||
- Called it "a security nightmare"
|
||||
|
||||
### Why It's Risky
|
||||
OpenClaw has deep system access:
|
||||
- Run shell commands
|
||||
- Read and write files
|
||||
- Execute scripts
|
||||
- Control browser
|
||||
|
||||
### Specific Risks
|
||||
1. **Prompt injection** through emails or documents
|
||||
2. **Malicious skills** in marketplace
|
||||
3. **Credential exposure**
|
||||
4. **Command misinterpretation** that deletes unintended files
|
||||
|
||||
### OpenClaw's Own Warning
|
||||
Documentation states: "There's no perfectly secure setup"
|
||||
|
||||
### Mitigation Strategies
|
||||
- Run on secondary machine
|
||||
- Use isolated accounts
|
||||
- Limit enabled skills
|
||||
- Monitor logs actively
|
||||
- Use Railway's one-click deployment (runs in isolated container)
|
||||
|
||||
## Key Architectural Takeaways
|
||||
|
||||
### The Four Components
|
||||
1. **Time** that produces events
|
||||
2. **Events** that trigger agents
|
||||
3. **State** that persists across interactions
|
||||
4. **Loop** that keeps processing
|
||||
|
||||
### Building Your Own
|
||||
You don't need OpenClaw specifically. You need:
|
||||
- Event scheduling mechanism
|
||||
- Queue system
|
||||
- LLM for processing
|
||||
- State persistence layer
|
||||
|
||||
### The Pattern
|
||||
This architecture will appear everywhere. Every AI agent framework that "feels alive" uses some version of:
|
||||
- Heartbeats
|
||||
- Cron jobs
|
||||
- Webhooks
|
||||
- Event loops
|
||||
- Persistent state
|
||||
|
||||
### Understanding vs Hype
|
||||
Understanding this architecture means you can:
|
||||
- Evaluate agent tools intelligently
|
||||
- Build your own implementations
|
||||
- Avoid getting caught up in viral hype
|
||||
- Recognize the pattern in new frameworks
|
||||
|
||||
## The Bottom Line
|
||||
|
||||
OpenClaw isn't magic. It's not sentient. It doesn't think or reason.
|
||||
|
||||
**It's inputs, queues, and a loop.**
|
||||
|
||||
The "alive" feeling comes from well-designed event-driven architecture that makes a reactive system appear proactive. Time becomes an input. External systems become inputs. Internal state becomes inputs. All processed through the same queue with persistent memory.
|
||||
|
||||
Elegant engineering, not artificial consciousness.
|
||||
|
||||
## Further Resources
|
||||
- OpenClaw documentation
|
||||
- Clairvo's original thread (inspiration for this breakdown)
|
||||
- Cisco security research on OpenClaw ecosystem
|
||||
140
docs/research/aetheel-vs-nanoclaw.md
Normal file
140
docs/research/aetheel-vs-nanoclaw.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Aetheel vs Nanoclaw: Feature Comparison & OpenCode Assessment
|
||||
|
||||
Aetheel is a solid reimplementation of the core nanoclaw concept in Python, but there are meaningful gaps. Here's what maps, what's missing, and where the opencode integration could be improved.
|
||||
|
||||
---
|
||||
|
||||
## What Aetheel Has (Maps Well to Nanoclaw)
|
||||
|
||||
| Feature | Nanoclaw | Aetheel | Status |
|
||||
|---|---|---|---|
|
||||
| Multi-channel adapters | WhatsApp (baileys) | Slack + Telegram | ✅ Good — cleaner abstraction via `BaseAdapter` |
|
||||
| Session isolation | Per-group sessions | Per-thread sessions via `SessionStore` | ✅ Good |
|
||||
| Dual runtime support | Claude Code SDK only | OpenCode (CLI+SDK) + Claude Code CLI | ✅ Good — more flexible |
|
||||
| Scheduled tasks | Cron + interval + once via MCP tool | Cron + one-shot via APScheduler | ✅ Good |
|
||||
| Subagent spawning | SDK `Task`/`TeamCreate` tools | Background threads via `SubagentManager` | ✅ Basic |
|
||||
| Memory system | CLAUDE.md files per group | SOUL.md + USER.md + MEMORY.md + hybrid search | ✅ Better — vector + BM25 search |
|
||||
| Skills system | `.claude/skills/` with SKILL.md | `skills/<name>/SKILL.md` with trigger matching | ✅ Good |
|
||||
| Action tags | MCP tools (send_message, schedule_task) | Regex-parsed `[ACTION:remind\|...]` tags | ✅ Different approach, works |
|
||||
|
||||
---
|
||||
|
||||
## What's Missing from Aetheel
|
||||
|
||||
### 1. Container Isolation
|
||||
|
||||
Nanoclaw's biggest architectural feature. Every agent runs in an isolated Apple Container (or Docker) with controlled volume mounts, secret injection via stdin, and per-group IPC namespaces. Aetheel runs everything in the same process. This means:
|
||||
|
||||
- No sandboxing of agent tool use (bash, file writes)
|
||||
- No mount-based security boundaries between groups
|
||||
- Secrets are in the process environment, not isolated
|
||||
|
||||
### 2. MCP Server Integration
|
||||
|
||||
Nanoclaw runs a custom MCP server (`ipc-mcp-stdio.ts`) inside the container that gives the agent tools like `send_message`, `schedule_task`, `register_group`. Aetheel uses regex-parsed action tags instead, which is fragile — the AI has to format tags perfectly, and there's no validation or structured tool calling.
|
||||
|
||||
### 3. Multi-Group Support
|
||||
|
||||
Nanoclaw has per-group folders, per-group memory (CLAUDE.md), per-group IPC, and a global memory layer. Aetheel has a single workspace with shared memory files. No group isolation.
|
||||
|
||||
### 4. Persistent Conversation Sessions on Disk
|
||||
|
||||
Nanoclaw stores sessions as JSONL files in `data/sessions/{group}/.claude/` and can resume at a specific assistant message UUID. Aetheel's `SessionStore` is in-memory only — sessions are lost on restart.
|
||||
|
||||
### 5. IPC Message Streaming
|
||||
|
||||
Nanoclaw's agent runner uses a `MessageStream` (AsyncIterable) to pipe follow-up messages into an active agent query. The host can send new messages to a running agent via IPC files. Aetheel's runtime is request-response only — one message in, one response out.
|
||||
|
||||
### 6. Transcript Archiving
|
||||
|
||||
Nanoclaw archives full conversation transcripts to markdown before context compaction via a `PreCompact` hook. Aetheel logs sessions to daily files but doesn't handle compaction.
|
||||
|
||||
### 7. Group Registration
|
||||
|
||||
Nanoclaw lets the main agent register new groups dynamically via an MCP tool. Aetheel has no equivalent.
|
||||
|
||||
### 8. Idle Timeout / Session Lifecycle
|
||||
|
||||
Nanoclaw has a 30-minute idle timeout that closes the container stdin, ending the session gracefully. Aetheel has session TTL cleanup but no active lifecycle management.
|
||||
|
||||
---
|
||||
|
||||
## OpenCode Integration Assessment
|
||||
|
||||
The opencode runtime implementation in `agent/opencode_runtime.py` is well-structured. Here's what's correct and what needs attention.
|
||||
|
||||
### What's Done Well
|
||||
|
||||
- Dual mode (CLI + SDK) with graceful fallback from SDK to CLI
|
||||
- Binary auto-discovery across common install paths
|
||||
- JSONL event parsing for `opencode run --format json` output
|
||||
- Session ID extraction from event stream
|
||||
- System prompt injection via XML tags (correct workaround since `opencode run` doesn't have `--system-prompt`)
|
||||
- Config from environment variables
|
||||
|
||||
### Issues / Improvements Needed
|
||||
|
||||
#### 1. SDK Client API Mismatch
|
||||
|
||||
The code calls `self._sdk_client.session.chat(session_id, **chat_kwargs)` but the opencode Python SDK uses `client.session.prompt()` not `.chat()`. The correct call is:
|
||||
|
||||
```python
|
||||
response = self._sdk_client.session.prompt(
|
||||
path={"id": session_id},
|
||||
body={"parts": parts, "model": model_config}
|
||||
)
|
||||
```
|
||||
|
||||
#### 2. SDK Client Initialization
|
||||
|
||||
The code uses `from opencode_ai import Opencode` but the actual SDK package is `@opencode-ai/sdk` (JS/TS) or `opencode-sdk-python` (Python). The Python SDK uses `createOpencodeClient` pattern. Verify the actual Python SDK import path — it may be `from opencode import Client` or similar depending on the package version.
|
||||
|
||||
#### 3. No `--continue` Flag Validation
|
||||
|
||||
The CLI mode passes `--continue` and `--session` for session continuity, but `opencode run` may not support `--continue` the same way as the TUI. The `opencode run` command is designed for single-shot execution. For session continuity in CLI mode, you'd need to use the SDK mode with `opencode serve`.
|
||||
|
||||
#### 4. Missing `--system` Flag
|
||||
|
||||
The code injects system prompts as XML in the message body. This works but is a workaround. The SDK mode's `client.session.prompt()` supports a `system` parameter in the body, which would be cleaner.
|
||||
|
||||
#### 5. No Structured Output Support
|
||||
|
||||
Opencode's SDK supports `format: { type: "json_schema", schema: {...} }` for structured responses. This could replace the fragile `[ACTION:...]` regex parsing with proper tool calls.
|
||||
|
||||
#### 6. No Plugin/Hook Integration
|
||||
|
||||
Opencode has a plugin system (`tool.execute.before`, `tool.execute.after`, `experimental.session.compacting`) that could replace the action tag parsing. You could create an opencode plugin that exposes `send_message` and `schedule_task` as custom tools, similar to nanoclaw's MCP approach.
|
||||
|
||||
#### 7. Session Persistence
|
||||
|
||||
`SessionStore` is in-memory. Opencode's server persists sessions natively, so in SDK mode you could rely on the server's session storage and just map `conversation_id → opencode_session_id` in a SQLite table.
|
||||
|
||||
---
|
||||
|
||||
## Architectural Gap Summary
|
||||
|
||||
The biggest architectural gap isn't about opencode specifically — it's that Aetheel runs the agent in-process without isolation, while nanoclaw's container model is what makes it safe to give the agent bash access and file write tools.
|
||||
|
||||
To close that gap, options include:
|
||||
|
||||
- **Containerize the opencode runtime** — run `opencode serve` inside a Docker container with controlled mounts
|
||||
- **Use opencode's permission system** — configure all dangerous tools to `"ask"` or `"deny"` per agent
|
||||
- **Add an MCP server** — replace action tag regex parsing with proper MCP tools for `send_message`, `schedule_task`, etc.
|
||||
- **Persist sessions to SQLite** — survive restarts and enable resume-at-message functionality
|
||||
|
||||
---
|
||||
|
||||
## Nanoclaw Features → Opencode Equivalents
|
||||
|
||||
| Nanoclaw (Claude Code SDK) | Opencode Equivalent | Gap Level |
|
||||
|---|---|---|
|
||||
| `query()` async iterable | HTTP server + SDK `client.session.prompt()` | 🔴 Architecture change needed |
|
||||
| `resume` + `resumeSessionAt` | `POST /session/:id/message` | 🟡 No resume-at-UUID equivalent |
|
||||
| Streaming message types (system/init, assistant, result) | SSE events via `GET /event` | 🟡 Different event schema |
|
||||
| `PreCompact` hook | `experimental.session.compacting` plugin | 🟢 Similar concept, different API |
|
||||
| `PreToolUse` hook (bash sanitization) | `tool.execute.before` plugin | 🟢 Similar concept, different API |
|
||||
| `bypassPermissions` | Per-tool permission config set to `"allow"` | 🟢 Direct mapping |
|
||||
| `isSingleUserTurn: false` via AsyncIterable | `prompt_async` endpoint | 🟡 Needs verification |
|
||||
| CLAUDE.md auto-loading via `settingSources` | AGENTS.md convention | 🟢 Rename files |
|
||||
| Secrets via `env` param on `query()` | `shell.env` plugin hook | 🟡 Different isolation model |
|
||||
| MCP servers in `query()` config | `opencode.json` mcp config or `POST /mcp` | 🟢 Direct mapping |
|
||||
243
docs/research/comparison.md
Normal file
243
docs/research/comparison.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# ⚔️ Aetheel vs. Inspiration Repos — Comparison & Missing Features
|
||||
|
||||
> A detailed comparison of Aetheel with Nanobot, NanoClaw, OpenClaw, and PicoClaw — highlighting what's different, what's missing, and what can be added.
|
||||
|
||||
---
|
||||
|
||||
## Feature Comparison Matrix
|
||||
|
||||
| Feature | Aetheel | Nanobot | NanoClaw | OpenClaw | PicoClaw |
|
||||
|---------|---------|---------|----------|----------|----------|
|
||||
| **Language** | Python | Python | TypeScript | TypeScript | Go |
|
||||
| **Channels** | Slack only | 9 channels | WhatsApp only | 15+ channels | 5 channels |
|
||||
| **LLM Runtime** | OpenCode / Claude Code (subprocess) | LiteLLM (multi-provider) | Claude Agent SDK | Pi Agent (custom RPC) | Go-native agent |
|
||||
| **Memory** | Hybrid (vector + BM25) | Simple file-based | Per-group CLAUDE.md | Workspace files | MEMORY.md + sessions |
|
||||
| **Config** | `config.json` with `env.vars` + `${VAR}` | `config.json` | Code changes (no config) | JSON5 config | `config.json` |
|
||||
| **Skills** | ❌ None | ✅ Bundled + custom | ✅ Code skills (transform) | ✅ Bundled + managed + workspace | ✅ Custom skills |
|
||||
| **Scheduled Tasks** | ⚠️ Action tags (remind only) | ✅ Full cron system | ✅ Task scheduler | ✅ Cron + webhooks + Gmail | ✅ Cron + heartbeat |
|
||||
| **Security** | ❌ No sandbox | ⚠️ Workspace restriction | ✅ Container isolation | ✅ Docker sandbox + pairing | ✅ Workspace sandbox |
|
||||
| **MCP Support** | ❌ No | ✅ Yes | ❌ No | ❌ No | ❌ No |
|
||||
| **Web Search** | ❌ No | ✅ Brave Search | ✅ Via Claude tools | ✅ Browser control | ✅ Brave + DuckDuckGo |
|
||||
| **Voice** | ❌ No | ✅ Via Groq Whisper | ❌ No | ✅ Voice Wake + Talk Mode | ✅ Via Groq Whisper |
|
||||
| **Browser Control** | ❌ No | ❌ No | ❌ No | ✅ Full CDP control | ❌ No |
|
||||
| **Companion Apps** | ❌ No | ❌ No | ❌ No | ✅ macOS + iOS + Android | ❌ No |
|
||||
| **Session Management** | ✅ Thread-based (Slack) | ✅ Session-based | ✅ Per-group isolated | ✅ Full sessions + agent-to-agent | ✅ Session-based |
|
||||
| **Docker Support** | ❌ No | ✅ Yes | ❌ (uses Apple Container) | ✅ Full compose setup | ✅ Yes |
|
||||
| **Install Script** | ✅ Yes | ✅ pip/uv install | ✅ Claude guides setup | ✅ npm + wizard | ✅ Binary / make |
|
||||
| **Identity Files** | ✅ SOUL.md, USER.md, MEMORY.md | ✅ AGENTS.md, SOUL.md, USER.md, etc. | ✅ CLAUDE.md per group | ✅ AGENTS.md, SOUL.md, USER.md, TOOLS.md | ✅ Full set (AGENTS, SOUL, IDENTITY, USER, TOOLS) |
|
||||
| **Subagents** | ❌ No | ✅ Spawn subagent | ✅ Agent Swarms | ✅ sessions_send / sessions_spawn | ✅ Spawn subagent |
|
||||
| **Heartbeat/Proactive** | ❌ No | ✅ Heartbeat | ❌ No | ✅ Cron + wakeups | ✅ HEARTBEAT.md |
|
||||
| **Multi-provider** | ⚠️ Via OpenCode/Claude | ✅ 12+ providers | ❌ Claude only | ✅ Multi-model + failover | ✅ 7+ providers |
|
||||
| **WebChat** | ❌ No | ❌ No | ❌ No | ✅ Built-in WebChat | ❌ No |
|
||||
|
||||
---
|
||||
|
||||
## What Aetheel Does Well
|
||||
|
||||
### ✅ Strengths
|
||||
|
||||
1. **Advanced Memory System** — Aetheel has the most sophisticated memory system with **hybrid search (0.7 vector + 0.3 BM25)**, local embeddings via `fastembed`, and SQLite FTS5. None of the other repos have this level of memory sophistication.
|
||||
|
||||
2. **Local-First Embeddings** — Zero API calls for memory search. Uses ONNX-based local model (BAAI/bge-small-en-v1.5).
|
||||
|
||||
3. **Dual Runtime Support** — Clean abstraction allowing switching between OpenCode and Claude Code with the same `AgentResponse` interface.
|
||||
|
||||
4. **Thread Isolation in Slack** — Each Slack thread gets its own AI session, providing natural conversation isolation.
|
||||
|
||||
5. **Action Tags** — Inline `[ACTION:remind|minutes|message]` tags are elegant for in-response scheduling.
|
||||
|
||||
6. **File Watching** — Memory auto-reindexes when `.md` files are edited.
|
||||
|
||||
---
|
||||
|
||||
## What Aetheel Is Missing
|
||||
|
||||
### 🔴 Critical Gaps (High Priority)
|
||||
|
||||
#### 1. Multi-Channel Support
|
||||
**Current:** Slack only
|
||||
**All others:** Multiple channels (3-15+)
|
||||
|
||||
Aetheel is locked to Slack. Adding at least **Telegram** and **Discord** would significantly increase usability. All four inspiration repos treat multi-channel as essential.
|
||||
|
||||
> **Recommendation:** Follow Nanobot's pattern — each channel is a module in `channels/` with a common interface. Start with Telegram (easiest — just a token).
|
||||
|
||||
#### 2. Skills System
|
||||
**Current:** None
|
||||
**Others:** All have skills/plugins
|
||||
|
||||
Aetheel has no way to extend agent capabilities beyond its hardcoded memory and runtime setup. A skills system would allow:
|
||||
- Bundled skills (GitHub, weather, web search)
|
||||
- User-created skills in workspace
|
||||
- Community-contributed skills
|
||||
|
||||
> **Recommendation:** Create a `skills/` directory in the workspace. Skills are markdown files (`SKILL.md`) that get injected into the agent's context.
|
||||
|
||||
#### 3. Scheduled Tasks (Cron)
|
||||
**Current:** Only `[ACTION:remind]` (one-time, simple)
|
||||
**Others:** Full cron systems with persistent storage
|
||||
|
||||
The action tag system is clever but limited. A proper cron system would support:
|
||||
- Recurring cron expressions (`0 9 * * *`)
|
||||
- Interval-based scheduling
|
||||
- Persistent job storage
|
||||
- CLI management
|
||||
|
||||
> **Recommendation:** Add a `cron/` module with SQLite-backed job storage and an APScheduler-based execution engine.
|
||||
|
||||
#### 4. Security Sandbox
|
||||
**Current:** No sandboxing
|
||||
**Others:** Container isolation (NanoClaw), workspace restriction (PicoClaw), Docker sandbox (OpenClaw)
|
||||
|
||||
The AI runtime has unrestricted system access. At minimum, workspace-level restrictions should be added.
|
||||
|
||||
> **Recommendation:** Follow PicoClaw's approach — restrict tool access to workspace directory by default. Block dangerous shell commands.
|
||||
|
||||
---
|
||||
|
||||
### 🟡 Important Gaps (Medium Priority)
|
||||
|
||||
#### 5. Config File System (JSON with env.vars — ✅ Done)
|
||||
**Current:** `config.json` with `env.vars` block and `${VAR}` substitution for secrets
|
||||
**Others:** JSON/JSON5 config files
|
||||
|
||||
Aetheel now uses a single config.json with an `env.vars` block for secrets and `${VAR}` references, matching openclaw's approach.
|
||||
|
||||
> **Status:** ✅ Implemented — no separate `.env` file needed.
|
||||
|
||||
#### 6. Web Search Tool
|
||||
**Current:** No web search
|
||||
**Others:** Brave Search, DuckDuckGo, or full browser control
|
||||
|
||||
The agent can't search the web. This is a significant limitation for a personal assistant.
|
||||
|
||||
> **Recommendation:** Add Brave Search API integration (free tier: 2000 queries/month) with DuckDuckGo as fallback.
|
||||
|
||||
#### 7. Subagent / Spawn Capability
|
||||
**Current:** No subagents
|
||||
**Others:** All have spawn/subagent systems
|
||||
|
||||
For long-running tasks, the main agent should be able to spawn background sub-tasks that work independently and report back.
|
||||
|
||||
> **Recommendation:** Add a `spawn` tool that creates a background thread/process running a separate agent session.
|
||||
|
||||
#### 8. Heartbeat / Proactive System
|
||||
**Current:** No proactive capabilities
|
||||
**Others:** Nanobot and PicoClaw have heartbeat systems
|
||||
|
||||
The agent only responds to messages. A heartbeat system would allow periodic check-ins, proactive notifications, and scheduled intelligence.
|
||||
|
||||
> **Recommendation:** Add `HEARTBEAT.md` file + periodic timer that triggers agent with heartbeat tasks.
|
||||
|
||||
#### 9. CLI Interface
|
||||
**Current:** Only `python main.py` with flags
|
||||
**Others:** Full CLI with subcommands (`nanobot agent`, `picoclaw cron`, etc.)
|
||||
|
||||
> **Recommendation:** Add a CLI using `click` or `argparse` with subcommands: `aetheel chat`, `aetheel status`, `aetheel cron`, etc.
|
||||
|
||||
#### 10. Tool System
|
||||
**Current:** No explicit tool system (AI handles everything via runtime)
|
||||
**Others:** Shell exec, file R/W, web search, spawn, message, etc.
|
||||
|
||||
Aetheel delegates all tool use to the AI runtime (OpenCode/Claude Code). While this works, having explicit tools gives more control and allows sandboxing.
|
||||
|
||||
> **Recommendation:** Define a tool interface and implement core tools (file ops, shell, web search) that run through the aetheel process with sandboxing.
|
||||
|
||||
---
|
||||
|
||||
### 🟢 Nice-to-Have (Lower Priority)
|
||||
|
||||
#### 11. MCP Server Support
|
||||
Only Nanobot supports MCP. Would allow connecting external tool servers.
|
||||
|
||||
#### 12. Multi-Provider Support
|
||||
Currently relies on OpenCode/Claude Code for provider handling. Direct multi-provider support (like Nanobot's 12+ providers via LiteLLM) would add flexibility.
|
||||
|
||||
#### 13. Docker / Container Support
|
||||
No Docker compose or containerized deployment option.
|
||||
|
||||
#### 14. Agent-to-Agent Communication
|
||||
OpenClaw's `sessions_send` allows agents to message each other. Useful for multi-agent workflows.
|
||||
|
||||
#### 15. Gateway Architecture
|
||||
Moving from a direct Slack adapter to a gateway pattern would make adding channels much easier.
|
||||
|
||||
#### 16. Onboarding Wizard
|
||||
OpenClaw's `onboard --install-daemon` provides a guided setup. Aetheel's install script is good but could be more interactive.
|
||||
|
||||
#### 17. Voice Support
|
||||
Voice Wake / Talk Mode (OpenClaw) or Whisper transcription (Nanobot, PicoClaw).
|
||||
|
||||
#### 18. WebChat Interface
|
||||
A browser-based chat UI connected to the gateway.
|
||||
|
||||
#### 19. TOOLS.md File
|
||||
A `TOOLS.md` file describing available tools to the agent, used by PicoClaw and OpenClaw.
|
||||
|
||||
#### 20. Self-Modification
|
||||
From `additions.txt`: "edit its own files and config as well as add skills" — the agent should be able to modify its own configuration and add new skills.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Comparison
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Aetheel["⚔️ Aetheel (Current)"]
|
||||
A_SLACK["Slack\n(only channel)"]
|
||||
A_MAIN["main.py"]
|
||||
A_MEM["Memory\n(hybrid search)"]
|
||||
A_RT["OpenCode / Claude\n(subprocess)"]
|
||||
end
|
||||
|
||||
subgraph Target["🎯 Target Architecture"]
|
||||
T_CHAN["Multi-Channel\nGateway"]
|
||||
T_CORE["Core Agent\n+ Tool System"]
|
||||
T_MEM["Memory\n(hybrid search)"]
|
||||
T_SK["Skills"]
|
||||
T_CRON["Cron"]
|
||||
T_PROV["Multi-Provider"]
|
||||
T_SEC["Security\nSandbox"]
|
||||
end
|
||||
|
||||
A_SLACK --> A_MAIN
|
||||
A_MAIN --> A_MEM
|
||||
A_MAIN --> A_RT
|
||||
|
||||
T_CHAN --> T_CORE
|
||||
T_CORE --> T_MEM
|
||||
T_CORE --> T_SK
|
||||
T_CORE --> T_CRON
|
||||
T_CORE --> T_PROV
|
||||
T_CORE --> T_SEC
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prioritized Roadmap Suggestion
|
||||
|
||||
Based on the analysis, here's a suggested implementation order:
|
||||
|
||||
### Phase 1: Foundation (Essentials)
|
||||
1. **Config system** — ✅ Done: `config.json` with `env.vars` + `${VAR}` substitution
|
||||
2. **Skills system** — `skills/` directory with `SKILL.md` loading
|
||||
3. **Tool system** — Core tools (shell, file, web search) with sandbox
|
||||
4. **Security sandbox** — Workspace-restricted tool execution
|
||||
|
||||
### Phase 2: Channels & Scheduling
|
||||
5. **Channel abstraction** — Extract adapter interface from Slack adapter
|
||||
6. **Telegram channel** — First new channel
|
||||
7. **Cron system** — Full scheduled task management
|
||||
8. **CLI** — Proper CLI with subcommands
|
||||
|
||||
### Phase 3: Advanced Features
|
||||
9. **Heartbeat** — Proactive agent capabilities
|
||||
10. **Subagents** — Spawn background tasks
|
||||
11. **Discord channel** — Second new channel
|
||||
12. **Web search** — Brave Search + DuckDuckGo
|
||||
|
||||
### Phase 4: Polish
|
||||
13. **Self-modification** — Agent can edit config and add skills
|
||||
14. **Docker support** — Dockerfile + compose
|
||||
15. **MCP support** — External tool servers
|
||||
16. **WebChat** — Browser-based chat UI
|
||||
59
docs/research/discord opencalw.md
Normal file
59
docs/research/discord opencalw.md
Normal file
@@ -0,0 +1,59 @@
|
||||
OpenClaw's Discord integration is a massive, enterprise-grade system compared to Aetheel's. Here are the key differences:
|
||||
|
||||
What Aetheel has and does fine:
|
||||
|
||||
Basic gateway connection via discord.py
|
||||
DM + @mention handling
|
||||
Message chunking (2000 char limit)
|
||||
Listen channels (respond without @mention in specific channels)
|
||||
Background thread support
|
||||
Token from env var
|
||||
What OpenClaw has that Aetheel is missing:
|
||||
|
||||
Multi-account support — OpenClaw can run multiple Discord bot accounts simultaneously, each with its own token, config, and identity. Aetheel supports exactly one bot token.
|
||||
|
||||
DM access policies — OpenClaw has pairing, allowlist, open, and disabled DM policies. Pairing mode requires users to get a code approved before they can DM the bot. Aetheel lets anyone DM the bot with zero access control.
|
||||
|
||||
Guild access policies — OpenClaw has open, allowlist, and disabled guild policies with per-guild and per-channel allowlists. You can restrict which servers, which channels within a server, and which users/roles can trigger the bot. Aetheel has no guild-level access control at all.
|
||||
|
||||
Role-based routing — OpenClaw can route Discord users to different AI agents based on their Discord roles. Aetheel has no concept of this.
|
||||
|
||||
[-] Interactive components (v2) — OpenClaw supports Discord buttons, select menus, modal forms, and media galleries. The AI can send rich interactive messages. Aetheel sends plain text only.
|
||||
|
||||
[-] Native slash commands — OpenClaw registers and handles Discord slash commands natively. Aetheel has no slash command support.
|
||||
|
||||
[-] Reply threading — OpenClaw supports replyToMode (off, first, all) and explicit [[reply_to:<id>]] tags so the bot can reply to specific messages. Aetheel doesn't use Discord's reply feature at all.
|
||||
|
||||
[-] History context — OpenClaw injects configurable message history (historyLimit, default 20) from the Discord channel into the AI context. Aetheel doesn't read channel history.
|
||||
|
||||
[-] Reaction handling — OpenClaw can receive and send reactions, with configurable notification modes (off, own, all, allowlist). Aetheel ignores reactions entirely.
|
||||
|
||||
[-] Ack reactions — OpenClaw sends an acknowledgement emoji (e.g. 👀) while processing a message, so users know the bot is working. Aetheel gives no processing feedback.
|
||||
|
||||
[-] Typing indicators — OpenClaw shows typing indicators while the agent processes. Aetheel doesn't.
|
||||
|
||||
Media/file handling — OpenClaw can send and receive files, images, and voice messages (with ffmpeg conversion). Aetheel ignores attachments.
|
||||
|
||||
Voice messages — OpenClaw can send voice messages with auto-generated waveforms. Aetheel has no voice support.
|
||||
|
||||
[-] Exec approvals — OpenClaw can post button-based approval prompts in Discord for dangerous operations (like shell commands). Aetheel has no human-in-the-loop approval flow.
|
||||
|
||||
Polls — OpenClaw can create Discord polls. Aetheel can't.
|
||||
|
||||
Moderation tools — OpenClaw exposes timeout, kick, ban, role management as AI-accessible actions with configurable gates. Aetheel has none.
|
||||
|
||||
Channel management — OpenClaw can create, edit, delete, and move channels. Aetheel can't.
|
||||
|
||||
PluralKit support — OpenClaw resolves proxied messages from PluralKit systems. Niche but shows the depth.
|
||||
|
||||
Presence/status — OpenClaw can set the bot's online status, activity, and streaming status. Aetheel's bot just shows as "online" with no custom status.
|
||||
|
||||
Gateway proxy — OpenClaw supports routing Discord traffic through an HTTP proxy. Aetheel doesn't.
|
||||
|
||||
Retry/resilience — OpenClaw has configurable retry policies for Discord API calls. Aetheel has no retry logic.
|
||||
|
||||
Config writes from chat — OpenClaw lets users modify bot config via Discord commands. Aetheel's /config set works but isn't Discord-specific.
|
||||
|
||||
Session isolation model — OpenClaw has sophisticated session keys: DMs share a main session by default, guild channels get isolated sessions (agent:<agentId>:discord:channel:<channelId>), slash commands get their own sessions. Aetheel uses channel_id as the conversation ID for everything, which is simpler but less flexible.
|
||||
|
||||
Bottom line: Aetheel's Discord adapter is a functional but minimal "receive messages, send text back" integration. OpenClaw's is a full Discord platform with interactive UI, access control, moderation, media, threading, multi-account, and agent routing. The biggest practical gaps for Aetheel are probably: access control (DM/guild policies), typing/ack indicators, reply threading, history context injection, and interactive components.
|
||||
207
docs/research/nanobot.md
Normal file
207
docs/research/nanobot.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# 🐈 Nanobot — Architecture & How It Works
|
||||
|
||||
> **Ultra-Lightweight Personal AI Assistant** — ~4,000 lines of Python, 99% smaller than OpenClaw.
|
||||
|
||||
## Overview
|
||||
|
||||
Nanobot is a minimalist personal AI assistant written in Python that focuses on delivering core agent functionality with the smallest possible codebase. It uses LiteLLM for multi-provider LLM routing, supports 9+ chat channels, and includes memory, skills, scheduled tasks, and MCP tool integration.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | Python 3.11+ |
|
||||
| **Lines of Code** | ~4,000 (core agent) |
|
||||
| **Config** | `~/.nanobot/config.json` |
|
||||
| **Package** | `pip install nanobot-ai` |
|
||||
| **LLM Routing** | LiteLLM (multi-provider) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Chat Channels"]
|
||||
TG["Telegram"]
|
||||
DC["Discord"]
|
||||
WA["WhatsApp"]
|
||||
FS["Feishu"]
|
||||
MC["Mochat"]
|
||||
DT["DingTalk"]
|
||||
SL["Slack"]
|
||||
EM["Email"]
|
||||
QQ["QQ"]
|
||||
end
|
||||
|
||||
subgraph Gateway["🌐 Gateway (nanobot gateway)"]
|
||||
CH["Channel Manager"]
|
||||
MQ["Message Queue"]
|
||||
end
|
||||
|
||||
subgraph Agent["🧠 Core Agent"]
|
||||
LOOP["Agent Loop\n(loop.py)"]
|
||||
CTX["Context Builder\n(context.py)"]
|
||||
MEM["Memory System\n(memory.py)"]
|
||||
SK["Skills Loader\n(skills.py)"]
|
||||
SA["Subagent\n(subagent.py)"]
|
||||
end
|
||||
|
||||
subgraph Tools["🔧 Built-in Tools"]
|
||||
SHELL["Shell Exec"]
|
||||
FILE["File R/W/Edit"]
|
||||
WEB["Web Search"]
|
||||
SPAWN["Spawn Subagent"]
|
||||
MCP["MCP Servers"]
|
||||
end
|
||||
|
||||
subgraph Providers["☁️ LLM Providers (LiteLLM)"]
|
||||
OR["OpenRouter"]
|
||||
AN["Anthropic"]
|
||||
OA["OpenAI"]
|
||||
DS["DeepSeek"]
|
||||
GR["Groq"]
|
||||
GE["Gemini"]
|
||||
VL["vLLM (local)"]
|
||||
end
|
||||
|
||||
Channels --> Gateway
|
||||
Gateway --> Agent
|
||||
CTX --> LOOP
|
||||
MEM --> CTX
|
||||
SK --> CTX
|
||||
LOOP --> Tools
|
||||
LOOP --> Providers
|
||||
SA --> LOOP
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Chat Channel
|
||||
participant GW as Gateway
|
||||
participant Agent as Agent Loop
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward message
|
||||
GW->>Agent: Route to agent
|
||||
Agent->>Agent: Build context (memory, skills, identity)
|
||||
Agent->>LLM: Send prompt + tools
|
||||
LLM-->>Agent: Response (text or tool call)
|
||||
|
||||
alt Tool Call
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Tool result
|
||||
Agent->>LLM: Send tool result
|
||||
LLM-->>Agent: Final response
|
||||
end
|
||||
|
||||
Agent->>Agent: Update memory
|
||||
Agent-->>GW: Return response
|
||||
GW-->>Channel: Send reply
|
||||
Channel-->>User: Display response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Agent Loop (`agent/loop.py`)
|
||||
The core loop that manages the LLM ↔ tool execution cycle:
|
||||
- Builds a prompt using context (memory, skills, identity files)
|
||||
- Sends to LLM via LiteLLM
|
||||
- If LLM returns a tool call → executes it → sends result back
|
||||
- Continues until LLM returns a text response (no more tool calls)
|
||||
|
||||
### 2. Context Builder (`agent/context.py`)
|
||||
Assembles the system prompt from:
|
||||
- **Identity files**: `AGENTS.md`, `SOUL.md`, `USER.md`, `TOOLS.md`, `IDENTITY.md`
|
||||
- **Memory**: Persistent `MEMORY.md` with recall
|
||||
- **Skills**: Loaded from `~/.nanobot/workspace/skills/`
|
||||
- **Conversation history**: Session-based context
|
||||
|
||||
### 3. Memory System (`agent/memory.py`)
|
||||
- Persistent memory stored in `MEMORY.md` in the workspace
|
||||
- Agent can read and write memories
|
||||
- Survives across sessions
|
||||
|
||||
### 4. Provider Registry (`providers/registry.py`)
|
||||
- Single-source-of-truth for all LLM providers
|
||||
- Adding a new provider = 2 steps (add `ProviderSpec` + config field)
|
||||
- Auto-prefixes model names for LiteLLM routing
|
||||
- Supports 12+ providers including local vLLM
|
||||
|
||||
### 5. Channel System (`channels/`)
|
||||
- 9 chat platforms supported (Telegram, Discord, WhatsApp, Feishu, Mochat, DingTalk, Slack, Email, QQ)
|
||||
- Each channel handles auth, message parsing, and response delivery
|
||||
- Allowlist-based security (`allowFrom`)
|
||||
- Started via `nanobot gateway`
|
||||
|
||||
### 6. Skills (`skills/`)
|
||||
- Bundled skills: GitHub, weather, tmux, etc.
|
||||
- Custom skills loaded from workspace
|
||||
- Skills are injected into the agent's context
|
||||
|
||||
### 7. Scheduled Tasks (Cron)
|
||||
- Add jobs via `nanobot cron add`
|
||||
- Supports cron expressions and interval-based scheduling
|
||||
- Jobs stored persistently
|
||||
|
||||
### 8. MCP Integration
|
||||
- Supports Model Context Protocol servers
|
||||
- Stdio and HTTP transport modes
|
||||
- Compatible with Claude Desktop / Cursor MCP configs
|
||||
- Tools auto-discovered and registered at startup
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
nanobot/
|
||||
├── agent/ # 🧠 Core agent logic
|
||||
│ ├── loop.py # Agent loop (LLM ↔ tool execution)
|
||||
│ ├── context.py # Prompt builder
|
||||
│ ├── memory.py # Persistent memory
|
||||
│ ├── skills.py # Skills loader
|
||||
│ ├── subagent.py # Background task execution
|
||||
│ └── tools/ # Built-in tools (incl. spawn)
|
||||
├── skills/ # 🎯 Bundled skills (github, weather, tmux...)
|
||||
├── channels/ # 📱 Chat channel integrations
|
||||
├── providers/ # ☁️ LLM provider registry
|
||||
├── config/ # ⚙️ Configuration schema
|
||||
├── cron/ # ⏰ Scheduled tasks
|
||||
├── heartbeat/ # 💓 Heartbeat system
|
||||
├── session/ # 📝 Session management
|
||||
├── bus/ # 📨 Internal event bus
|
||||
├── cli/ # 🖥️ CLI commands
|
||||
└── utils/ # 🔧 Utilities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `nanobot onboard` | Initialize config & workspace |
|
||||
| `nanobot agent -m "..."` | Chat with the agent |
|
||||
| `nanobot agent` | Interactive chat mode |
|
||||
| `nanobot gateway` | Start all channels |
|
||||
| `nanobot status` | Show status |
|
||||
| `nanobot cron add/list/remove` | Manage scheduled tasks |
|
||||
| `nanobot channels login` | Link WhatsApp device |
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **LiteLLM for provider abstraction** — One interface for all LLM providers
|
||||
2. **JSON config over env vars** — Single `config.json` file for all settings
|
||||
3. **Skills-based extensibility** — Modular skill system for adding capabilities
|
||||
4. **Provider Registry pattern** — Adding providers is 2-step, zero if-elif chains
|
||||
5. **Agent social network** — Can join MoltBook, ClawdChat communities
|
||||
315
docs/research/nanoclaw-comparison.md
Normal file
315
docs/research/nanoclaw-comparison.md
Normal file
@@ -0,0 +1,315 @@
|
||||
# Aetheel vs NanoClaw — Feature Gap Analysis
|
||||
|
||||
Deep comparison of Aetheel (Python, multi-channel AI assistant) and NanoClaw (TypeScript, container-isolated personal AI assistant). Focus: what NanoClaw has that Aetheel is missing.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Differences
|
||||
|
||||
| Aspect | Aetheel | NanoClaw |
|
||||
|--------|---------|----------|
|
||||
| Language | Python | TypeScript |
|
||||
| Agent execution | In-process (shared memory) | Container-isolated (Apple Container / Docker) |
|
||||
| Identity model | Shared across all channels (SOUL.md, USER.md, MEMORY.md) | Per-group (each group has its own CLAUDE.md) |
|
||||
| Security model | Application-level checks | OS-level container isolation |
|
||||
| Config approach | Config-driven (`config.json` with `env.vars` + `${VAR}`) | Code-first (Claude modifies your fork) |
|
||||
| Philosophy | Feature-rich framework | Minimal, understandable in 8 minutes |
|
||||
|
||||
---
|
||||
|
||||
## Features Aetheel Is Missing
|
||||
|
||||
### 1. Container Isolation (Critical)
|
||||
|
||||
NanoClaw runs every agent invocation inside a Linux container (Apple Container on macOS, Docker on Linux). Each container:
|
||||
- Gets only explicitly mounted directories
|
||||
- Runs as non-root (uid 1000)
|
||||
- Is ephemeral (`--rm` flag, fresh per invocation)
|
||||
- Cannot access other groups' files or sessions
|
||||
- Cannot access host filesystem beyond mounts
|
||||
|
||||
Aetheel runs everything in-process with no sandboxing. The security audit already flagged path traversal, arbitrary code execution via hooks, and unvalidated action tags as critical issues.
|
||||
|
||||
**What to build:**
|
||||
- Docker-based agent execution (spawn a container per AI request)
|
||||
- Mount only the relevant group's workspace directory
|
||||
- Pass secrets via stdin, not mounted files
|
||||
- Add a `/convert-to-docker` skill or built-in Docker mode
|
||||
|
||||
---
|
||||
|
||||
### 2. Per-Group Isolation
|
||||
|
||||
NanoClaw gives each chat group its own:
|
||||
- Filesystem folder (`groups/{name}/`)
|
||||
- Memory file (`CLAUDE.md` per group)
|
||||
- Session history (isolated `.claude/` directory)
|
||||
- IPC namespace (prevents cross-group privilege escalation)
|
||||
- Container mounts (only own folder + read-only global)
|
||||
|
||||
Aetheel shares SOUL.md, USER.md, and MEMORY.md across all channels and conversations. A Slack channel, Discord server, and Telegram group all see the same memory and identity.
|
||||
|
||||
**What to build:**
|
||||
- Per-channel or per-group workspace directories
|
||||
- Isolated session storage per group
|
||||
- A `global/` shared memory that all groups can read but only the main channel can write
|
||||
- Group registration system (like NanoClaw's `registerGroup()`)
|
||||
|
||||
---
|
||||
|
||||
### 3. Working Agent Teams / Swarms
|
||||
|
||||
NanoClaw has working agent teams today via Claude Code's experimental `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`:
|
||||
- Lead agent creates teammates using Claude's native `TeamCreate` / `SendMessage` tools
|
||||
- Each teammate runs in its own container
|
||||
- On Telegram, each agent gets a dedicated bot identity (pool of pre-created bots renamed dynamically via `setMyName`)
|
||||
- The lead agent coordinates but doesn't relay every message — users see teammate messages directly
|
||||
- `<internal>` tags let agents communicate without spamming the user
|
||||
|
||||
Aetheel has the tools in the allowed list (`TeamCreate`, `TeamDelete`, `SendMessage`) but no actual orchestration, no per-agent identity, and no way for teammates to appear as separate entities in chat.
|
||||
|
||||
**What to build:**
|
||||
- Enable `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` when using Claude runtime
|
||||
- Bot pool for Telegram/Discord (multiple bot tokens, one per agent role)
|
||||
- IPC routing that respects `sender` field to route messages through the right bot
|
||||
- Per-agent CLAUDE.md / SOUL.md files
|
||||
- `<internal>` tag stripping in outbound messages
|
||||
|
||||
---
|
||||
|
||||
### 4. Mount Security / Allowlist
|
||||
|
||||
NanoClaw has a tamper-proof mount allowlist at `~/.config/nanoclaw/mount-allowlist.json` (outside the project root, never mounted into containers):
|
||||
- Defines which host directories can be mounted
|
||||
- Default blocked patterns: `.ssh`, `.gnupg`, `.aws`, `.env`, `private_key`, etc.
|
||||
- Symlink resolution before validation (prevents traversal)
|
||||
- `nonMainReadOnly` forces read-only for non-main groups
|
||||
- Per-root `allowReadWrite` control
|
||||
|
||||
Aetheel has no filesystem access control. The AI can read/write anywhere the process has permissions.
|
||||
|
||||
**What to build:**
|
||||
- External allowlist config (outside workspace, not modifiable by the AI)
|
||||
- Blocked path patterns for sensitive directories
|
||||
- Symlink resolution and path validation
|
||||
- Read-only enforcement for non-primary channels
|
||||
|
||||
---
|
||||
|
||||
### 5. IPC-Based Communication
|
||||
|
||||
NanoClaw uses file-based IPC for all agent-to-host communication:
|
||||
- Agents write JSON files to `data/ipc/{group}/messages/` and `data/ipc/{group}/tasks/`
|
||||
- Host polls IPC directories and processes files
|
||||
- Per-group IPC namespaces prevent cross-group message injection
|
||||
- Authorization checks: non-main groups can only send to their own chat, schedule tasks for themselves
|
||||
- Error files moved to `data/ipc/errors/` for debugging
|
||||
|
||||
Aetheel uses in-memory action tags parsed from AI response text (`[ACTION:remind|...]`, `[ACTION:cron|...]`). No authorization, no isolation, no audit trail.
|
||||
|
||||
**What to build:**
|
||||
- File-based or queue-based IPC for agent communication
|
||||
- Per-group namespaces with authorization
|
||||
- Audit trail for all IPC operations
|
||||
- Error handling with failed message preservation
|
||||
|
||||
---
|
||||
|
||||
### 6. Group Queue with Concurrency Control
|
||||
|
||||
NanoClaw has a `GroupQueue` class that manages container execution:
|
||||
- Max concurrent containers limit (`MAX_CONCURRENT_CONTAINERS`, default 5)
|
||||
- Per-group queuing (messages and tasks queue while container is active)
|
||||
- Follow-up messages sent to active containers via IPC input files
|
||||
- Idle timeout with `_close` sentinel to wind down containers
|
||||
- Exponential backoff retry (5s base, max 5 retries)
|
||||
- Graceful shutdown (detaches containers, doesn't kill them)
|
||||
- Task priority over messages in drain order
|
||||
|
||||
Aetheel has a simple concurrent limit of 3 subagents but no queuing, no retry logic, no follow-up message support, and no graceful shutdown.
|
||||
|
||||
**What to build:**
|
||||
- Proper execution queue with configurable concurrency
|
||||
- Per-channel message queuing when agent is busy
|
||||
- Follow-up message injection into active sessions
|
||||
- Exponential backoff retry on failures
|
||||
- Graceful shutdown that lets active agents finish
|
||||
|
||||
---
|
||||
|
||||
### 7. Task Context Modes
|
||||
|
||||
NanoClaw scheduled tasks support two context modes:
|
||||
- `group` — uses the group's existing session (shared conversation history)
|
||||
- `isolated` — fresh session per task run (no prior context)
|
||||
|
||||
Aetheel scheduled tasks always run in a fresh context with no option to share the group's conversation history.
|
||||
|
||||
**What to build:**
|
||||
- `context_mode` field on scheduled jobs (`group` vs `isolated`)
|
||||
- Session ID passthrough for `group` mode tasks
|
||||
|
||||
---
|
||||
|
||||
### 8. Task Run Logging
|
||||
|
||||
NanoClaw logs every task execution:
|
||||
- `task_run_logs` table with: task_id, run_at, duration_ms, status, result, error
|
||||
- `last_result` summary stored on the task itself
|
||||
- Tasks auto-complete after `once` schedule runs
|
||||
|
||||
Aetheel's scheduler persists jobs but doesn't log execution history or results.
|
||||
|
||||
**What to build:**
|
||||
- Task run log table (when it ran, how long, success/error, result summary)
|
||||
- Queryable task history (`task history <id>`)
|
||||
|
||||
---
|
||||
|
||||
### 9. Streaming Output with Idle Timeout
|
||||
|
||||
NanoClaw streams agent output in real-time:
|
||||
- Container output is parsed as it arrives (sentinel markers for robust parsing)
|
||||
- Results are forwarded to the user immediately via `sendMessage`
|
||||
- Idle timeout (default 30 min) closes the container if no output for too long
|
||||
- Prevents hanging containers from blocking the queue
|
||||
|
||||
Aetheel waits for the full AI response before sending anything back.
|
||||
|
||||
**What to build:**
|
||||
- Streaming response support (send partial results as they arrive)
|
||||
- Idle timeout for long-running agent sessions
|
||||
- Typing indicators while agent is processing
|
||||
|
||||
---
|
||||
|
||||
### 10. Skills as Code Transformations
|
||||
|
||||
NanoClaw's skills are fundamentally different from Aetheel's:
|
||||
- Skills are SKILL.md files that teach Claude Code how to modify the codebase
|
||||
- A deterministic skills engine applies code changes (three-way merge, file additions)
|
||||
- Skills have state tracking (`.nanoclaw/state.yaml`), backups, and rollback
|
||||
- Examples: `/add-telegram`, `/add-discord`, `/add-gmail`, `/add-voice-transcription`, `/convert-to-docker`, `/add-parallel`
|
||||
- Each skill is a complete guide: pre-flight checks, code changes, setup, verification, troubleshooting
|
||||
|
||||
Aetheel's skills are runtime context injections (markdown instructions added to the system prompt when trigger words match). They don't modify code.
|
||||
|
||||
**What to build:**
|
||||
- Skills engine that can apply code transformations
|
||||
- State tracking for applied skills
|
||||
- Rollback support
|
||||
- Template skills for common integrations
|
||||
|
||||
---
|
||||
|
||||
### 11. Voice Message Transcription
|
||||
|
||||
NanoClaw has a skill (`/add-voice-transcription`) that:
|
||||
- Detects WhatsApp voice notes (`audioMessage.ptt === true`)
|
||||
- Downloads audio via Baileys
|
||||
- Transcribes using OpenAI Whisper API
|
||||
- Stores transcribed content as `[Voice: <text>]` in the database
|
||||
- Configurable provider, fallback message, enable/disable
|
||||
|
||||
Aetheel has no voice message handling.
|
||||
|
||||
**What to build:**
|
||||
- Voice message detection per adapter (Telegram, Discord, Slack all support voice)
|
||||
- Whisper API integration for transcription
|
||||
- Transcribed content injection into the conversation
|
||||
|
||||
---
|
||||
|
||||
### 12. Gmail / Email Integration
|
||||
|
||||
NanoClaw has a skill (`/add-gmail`) with two modes:
|
||||
- Tool mode: agent can read/send emails when triggered from chat
|
||||
- Channel mode: emails trigger the agent, agent replies via email
|
||||
- GCP OAuth setup guide
|
||||
- Email polling with deduplication
|
||||
- Per-thread or per-sender context isolation
|
||||
|
||||
Aetheel has no email integration.
|
||||
|
||||
**What to build:**
|
||||
- Gmail MCP integration (or direct API)
|
||||
- Email as a channel adapter
|
||||
- OAuth credential management
|
||||
|
||||
---
|
||||
|
||||
### 13. WhatsApp Support
|
||||
|
||||
NanoClaw's primary channel is WhatsApp via the Baileys library:
|
||||
- QR code and pairing code authentication
|
||||
- Group metadata sync
|
||||
- Message history storage per registered group
|
||||
- Bot message filtering (prevents echo loops)
|
||||
|
||||
Aetheel supports Slack, Discord, Telegram, and WebChat but not WhatsApp.
|
||||
|
||||
**What to build:**
|
||||
- WhatsApp adapter using a library like Baileys or the WhatsApp Business API
|
||||
- QR code authentication flow
|
||||
- Group registration and metadata sync
|
||||
|
||||
---
|
||||
|
||||
### 14. Structured Message Routing
|
||||
|
||||
NanoClaw has a clean channel abstraction:
|
||||
- `Channel` interface: `connect()`, `sendMessage()`, `isConnected()`, `ownsJid()`, `disconnect()`, `setTyping?()`
|
||||
- `findChannel()` routes outbound messages to the right channel by JID prefix (`tg:`, `dc:`, WhatsApp JIDs)
|
||||
- `formatOutbound()` strips `<internal>` tags before sending
|
||||
- XML-escaped message formatting for agent input
|
||||
|
||||
Aetheel's adapters work but lack JID-based routing, `<internal>` tag support, and typing indicators across all adapters.
|
||||
|
||||
**What to build:**
|
||||
- JID-based message routing (prefix per channel)
|
||||
- `<internal>` tag stripping for agent-to-agent communication
|
||||
- Typing indicators for all adapters
|
||||
- Unified channel interface with `ownsJid()` pattern
|
||||
|
||||
---
|
||||
|
||||
## Priority Recommendations
|
||||
|
||||
### High Priority (Security + Core Gaps)
|
||||
1. Container isolation for agent execution
|
||||
2. Fix the 10 critical/high security issues from the security audit
|
||||
3. Per-group isolation (memory, sessions, filesystem)
|
||||
4. Mount security allowlist
|
||||
|
||||
### Medium Priority (Feature Parity)
|
||||
5. Working agent teams with per-agent identity
|
||||
6. Group queue with concurrency control and retry
|
||||
7. Task context modes and run logging
|
||||
8. Streaming output with idle timeout
|
||||
9. IPC-based communication with authorization
|
||||
|
||||
### Lower Priority (Nice to Have)
|
||||
10. Voice message transcription
|
||||
11. WhatsApp adapter
|
||||
12. Gmail/email integration
|
||||
13. Skills as code transformations
|
||||
14. Structured message routing with JID prefixes
|
||||
|
||||
---
|
||||
|
||||
## What Aetheel Has That NanoClaw Doesn't
|
||||
|
||||
For reference, these are Aetheel strengths to preserve:
|
||||
|
||||
- Dual runtime support (OpenCode + Claude Code) with live switching
|
||||
- Auto-failover on rate limits
|
||||
- Per-request cost tracking and usage stats
|
||||
- Local vector search (hybrid: 0.7 vector + 0.3 BM25) with fastembed
|
||||
- Built-in multi-channel (Slack, Discord, Telegram, WebChat, Webhooks)
|
||||
- WebChat browser UI
|
||||
- Heartbeat / proactive task system
|
||||
- Lifecycle hooks (gateway:startup, command:reload, agent:response, etc.)
|
||||
- Comprehensive CLI (`aetheel start/stop/restart/logs/doctor/config/cron/memory`)
|
||||
- Config-driven setup (no code changes needed for basic customization)
|
||||
- Self-modification (AI can edit its own config, skills, identity files)
|
||||
- Hot reload (`/reload` command)
|
||||
214
docs/research/nanoclaw.md
Normal file
214
docs/research/nanoclaw.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# 🦀 NanoClaw — Architecture & How It Works
|
||||
|
||||
> **Minimal, Security-First Personal AI Assistant** — built on Claude Agent SDK with container isolation.
|
||||
|
||||
## Overview
|
||||
|
||||
NanoClaw is a minimalist personal AI assistant that prioritizes **security through container isolation** and **understandability through small codebase size**. It runs on Claude Agent SDK (Claude Code) and uses WhatsApp as its primary channel. Each group chat runs in its own isolated Linux container.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | TypeScript (Node.js 20+) |
|
||||
| **Codebase Size** | ~34.9k tokens (~17% of Claude context window) |
|
||||
| **Config** | No config files — code changes only |
|
||||
| **AI Runtime** | Claude Agent SDK (Claude Code) |
|
||||
| **Primary Channel** | WhatsApp (Baileys) |
|
||||
| **Isolation** | Apple Container (macOS) / Docker (Linux) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph WhatsApp["📱 WhatsApp"]
|
||||
WA["WhatsApp Client\n(Baileys)"]
|
||||
end
|
||||
|
||||
subgraph Core["🧠 Core Process (Single Node.js)"]
|
||||
IDX["Orchestrator\n(index.ts)"]
|
||||
DB["SQLite DB\n(db.ts)"]
|
||||
GQ["Group Queue\n(group-queue.ts)"]
|
||||
TS["Task Scheduler\n(task-scheduler.ts)"]
|
||||
IPC["IPC Watcher\n(ipc.ts)"]
|
||||
RT["Router\n(router.ts)"]
|
||||
end
|
||||
|
||||
subgraph Containers["🐳 Isolated Containers"]
|
||||
C1["Container 1\nGroup A\n(CLAUDE.md)"]
|
||||
C2["Container 2\nGroup B\n(CLAUDE.md)"]
|
||||
C3["Container 3\nMain Channel\n(CLAUDE.md)"]
|
||||
end
|
||||
|
||||
subgraph Memory["💾 Per-Group Memory"]
|
||||
M1["groups/A/CLAUDE.md"]
|
||||
M2["groups/B/CLAUDE.md"]
|
||||
M3["groups/main/CLAUDE.md"]
|
||||
end
|
||||
|
||||
WA --> IDX
|
||||
IDX --> DB
|
||||
IDX --> GQ
|
||||
GQ --> Containers
|
||||
TS --> Containers
|
||||
Containers --> IPC
|
||||
IPC --> RT
|
||||
RT --> WA
|
||||
C1 --- M1
|
||||
C2 --- M2
|
||||
C3 --- M3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant WA as WhatsApp (Baileys)
|
||||
participant IDX as Orchestrator
|
||||
participant DB as SQLite
|
||||
participant GQ as Group Queue
|
||||
participant Container as Container (Claude SDK)
|
||||
participant IPC as IPC Watcher
|
||||
|
||||
User->>WA: Send message with @Andy
|
||||
WA->>IDX: New message event
|
||||
IDX->>DB: Store message
|
||||
IDX->>GQ: Enqueue (per-group, concurrency limited)
|
||||
GQ->>Container: Spawn Claude agent container
|
||||
Note over Container: Mounts only group's filesystem
|
||||
Note over Container: Reads group-specific CLAUDE.md
|
||||
Container->>Container: Claude processes with tools
|
||||
Container->>IPC: Write response to filesystem
|
||||
IPC->>IDX: Detect new response file
|
||||
IDX->>WA: Send reply
|
||||
WA->>User: Display response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Orchestrator (`src/index.ts`)
|
||||
The single entry point that manages:
|
||||
- WhatsApp connection state
|
||||
- Message polling loop
|
||||
- Agent invocation decisions
|
||||
- State management for groups and sessions
|
||||
|
||||
### 2. WhatsApp Channel (`src/channels/whatsapp.ts`)
|
||||
- Uses **Baileys** library for WhatsApp Web connection
|
||||
- Handles authentication via QR code scan
|
||||
- Manages send/receive of messages
|
||||
- Supports media messages
|
||||
|
||||
### 3. Container Runner (`src/container-runner.ts`)
|
||||
The security core of NanoClaw:
|
||||
- Spawns **streaming Claude Agent SDK** containers
|
||||
- Each group runs in its own Linux container
|
||||
- **Apple Container** on macOS, **Docker** on Linux
|
||||
- Only explicitly mounted directories are accessible
|
||||
- Bash commands run INSIDE the container, not on host
|
||||
|
||||
### 4. SQLite Database (`src/db.ts`)
|
||||
- Stores messages, groups, sessions, and state
|
||||
- Per-group message history
|
||||
- Session continuity tracking
|
||||
|
||||
### 5. Group Queue (`src/group-queue.ts`)
|
||||
- Per-group message queue
|
||||
- Global concurrency limit
|
||||
- Ensures one agent invocation per group at a time
|
||||
|
||||
### 6. IPC System (`src/ipc.ts`)
|
||||
- Filesystem-based inter-process communication
|
||||
- Container writes response to mounted directory
|
||||
- IPC watcher detects and processes response files
|
||||
- Handles task results from scheduled jobs
|
||||
|
||||
### 7. Task Scheduler (`src/task-scheduler.ts`)
|
||||
- Recurring jobs that run Claude in containers
|
||||
- Jobs can message the user back
|
||||
- Managed from the main channel (self-chat)
|
||||
|
||||
### 8. Router (`src/router.ts`)
|
||||
- Formats outbound messages
|
||||
- Routes responses to correct WhatsApp recipient
|
||||
|
||||
### 9. Per-Group Memory (`groups/*/CLAUDE.md`)
|
||||
- Each group has its own `CLAUDE.md` memory file
|
||||
- Mounted into the group's container
|
||||
- Complete filesystem isolation between groups
|
||||
|
||||
---
|
||||
|
||||
## Security Model
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Host["🖥️ Host System"]
|
||||
NanoClaw["NanoClaw Process"]
|
||||
end
|
||||
|
||||
subgraph Container1["🐳 Container (Group A)"]
|
||||
Agent1["Claude Agent"]
|
||||
FS1["Mounted: groups/A/"]
|
||||
end
|
||||
|
||||
subgraph Container2["🐳 Container (Group B)"]
|
||||
Agent2["Claude Agent"]
|
||||
FS2["Mounted: groups/B/"]
|
||||
end
|
||||
|
||||
NanoClaw -->|"Spawns"| Container1
|
||||
NanoClaw -->|"Spawns"| Container2
|
||||
|
||||
style Container1 fill:#e8f5e9
|
||||
style Container2 fill:#e8f5e9
|
||||
```
|
||||
|
||||
- **OS-level isolation** vs. application-level permission checks
|
||||
- Agents can only see what's explicitly mounted
|
||||
- Bash commands run in container, not on host
|
||||
- No shared memory between groups
|
||||
|
||||
---
|
||||
|
||||
## Philosophy & Design Decisions
|
||||
|
||||
1. **Small enough to understand** — Read the entire codebase in ~8 minutes
|
||||
2. **Secure by isolation** — Linux containers, not permission checks
|
||||
3. **Built for one user** — Not a framework, working software for personal use
|
||||
4. **Customization = code changes** — No config sprawl, modify the code directly
|
||||
5. **AI-native** — Claude Code handles setup (`/setup`), debugging, customization
|
||||
6. **Skills over features** — Don't add features to codebase, add skills that transform forks
|
||||
7. **Best harness, best model** — Claude Agent SDK gives Claude Code superpowers
|
||||
|
||||
---
|
||||
|
||||
## Agent Swarms (Unique Feature)
|
||||
|
||||
NanoClaw is the **first personal AI assistant** to support Agent Swarms:
|
||||
- Spin up teams of specialized agents
|
||||
- Agents collaborate within your chat
|
||||
- Each agent runs in its own container
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Setup (Claude Code handles everything)
|
||||
git clone https://github.com/gavrielc/nanoclaw.git
|
||||
cd nanoclaw
|
||||
claude
|
||||
# Then run /setup
|
||||
|
||||
# Talk to your assistant
|
||||
@Andy send me a daily summary every morning at 9am
|
||||
@Andy review the git history and update the README
|
||||
```
|
||||
|
||||
Trigger word: `@Andy` (customizable via code changes)
|
||||
291
docs/research/openclaw.md
Normal file
291
docs/research/openclaw.md
Normal file
@@ -0,0 +1,291 @@
|
||||
# 🦞 OpenClaw — Architecture & How It Works
|
||||
|
||||
> **Full-Featured Personal AI Assistant** — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.
|
||||
|
||||
## Overview
|
||||
|
||||
OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | TypeScript (Node.js ≥22) |
|
||||
| **Codebase Size** | 430k+ lines, 50+ source modules |
|
||||
| **Config** | `~/.openclaw/openclaw.json` (JSON5) |
|
||||
| **AI Runtime** | Pi Agent (custom RPC), multi-model |
|
||||
| **Channels** | 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.) |
|
||||
| **Package Mgr** | pnpm (monorepo) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Messaging Channels (15+)"]
|
||||
WA["WhatsApp\n(Baileys)"]
|
||||
TG["Telegram\n(grammY)"]
|
||||
SL["Slack\n(Bolt)"]
|
||||
DC["Discord\n(discord.js)"]
|
||||
GC["Google Chat"]
|
||||
SIG["Signal\n(signal-cli)"]
|
||||
BB["BlueBubbles\n(iMessage)"]
|
||||
IM["iMessage\n(legacy)"]
|
||||
MST["MS Teams"]
|
||||
MTX["Matrix"]
|
||||
ZL["Zalo"]
|
||||
WC["WebChat"]
|
||||
end
|
||||
|
||||
subgraph Gateway["🌐 Gateway (Control Plane)"]
|
||||
WS["WebSocket Server\nws://127.0.0.1:18789"]
|
||||
SES["Session Manager"]
|
||||
RTE["Channel Router"]
|
||||
PRES["Presence System"]
|
||||
Q["Message Queue"]
|
||||
CFG["Config Manager"]
|
||||
AUTH["Auth / Pairing"]
|
||||
end
|
||||
|
||||
subgraph Agent["🧠 Pi Agent (RPC)"]
|
||||
AGENT["Agent Runtime"]
|
||||
TOOLS["Tool Registry"]
|
||||
STREAM["Block Streaming"]
|
||||
PROV["Provider Router\n(multi-model)"]
|
||||
end
|
||||
|
||||
subgraph Apps["📲 Companion Apps"]
|
||||
MAC["macOS Menu Bar"]
|
||||
IOS["iOS Node"]
|
||||
ANDR["Android Node"]
|
||||
end
|
||||
|
||||
subgraph ToolSet["🔧 Tools & Automation"]
|
||||
BROWSER["Browser Control\n(CDP/Chromium)"]
|
||||
CANVAS["Live Canvas\n(A2UI)"]
|
||||
CRON["Cron Jobs"]
|
||||
WEBHOOK["Webhooks"]
|
||||
GMAIL["Gmail Pub/Sub"]
|
||||
NODES["Nodes\n(camera, screen, location)"]
|
||||
SKILLS_T["Skills Platform"]
|
||||
SESS_T["Session Tools\n(agent-to-agent)"]
|
||||
end
|
||||
|
||||
subgraph Workspace["💾 Workspace"]
|
||||
AGENTS_MD["AGENTS.md"]
|
||||
SOUL_MD["SOUL.md"]
|
||||
USER_MD["USER.md"]
|
||||
TOOLS_MD["TOOLS.md"]
|
||||
SKILLS_W["Skills/"]
|
||||
end
|
||||
|
||||
Channels --> Gateway
|
||||
Apps --> Gateway
|
||||
Gateway --> Agent
|
||||
Agent --> ToolSet
|
||||
Agent --> Workspace
|
||||
Agent --> PROV
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Channel (WA/TG/Slack/etc.)
|
||||
participant GW as Gateway (WS)
|
||||
participant Session as Session Manager
|
||||
participant Agent as Pi Agent (RPC)
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward via channel adapter
|
||||
GW->>Session: Route to session (main/group)
|
||||
GW->>GW: Check auth (pairing/allowlist)
|
||||
Session->>Agent: Invoke agent (RPC)
|
||||
Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
|
||||
Agent->>LLM: Stream request (with tool definitions)
|
||||
|
||||
loop Tool Use Loop
|
||||
LLM-->>Agent: Tool call (block stream)
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Tool result
|
||||
Agent->>LLM: Continue with result
|
||||
end
|
||||
|
||||
LLM-->>Agent: Final response (block stream)
|
||||
Agent-->>Session: Return response
|
||||
Session->>GW: Add to outbound queue
|
||||
GW->>GW: Chunk if needed (per-channel limits)
|
||||
GW->>Channel: Send chunked replies
|
||||
Channel->>User: Display response
|
||||
|
||||
Note over GW: Typing indicators, presence updates
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Gateway (`src/gateway/`)
|
||||
The central control plane — everything connects through it:
|
||||
- **WebSocket server** on `ws://127.0.0.1:18789`
|
||||
- Session management (main, group, per-channel)
|
||||
- Multi-agent routing (different agents for different channels)
|
||||
- Presence tracking and typing indicators
|
||||
- Config management and hot-reload
|
||||
- Health checks, doctor diagnostics
|
||||
|
||||
### 2. Pi Agent (`src/agents/`)
|
||||
Custom RPC-based agent runtime:
|
||||
- Tool streaming and block streaming
|
||||
- Multi-model support with failover
|
||||
- Session pruning for long conversations
|
||||
- Usage tracking (tokens, cost)
|
||||
- Thinking level control (off → xhigh)
|
||||
|
||||
### 3. Channel System (`src/channels/` + per-channel dirs)
|
||||
15+ channel adapters, each with:
|
||||
- Auth handling (pairing codes, allowlists, OAuth)
|
||||
- Message format conversion
|
||||
- Media pipeline (images, audio, video)
|
||||
- Group routing with mention gating
|
||||
- Per-channel chunking (character limits differ)
|
||||
|
||||
### 4. Security System (`src/security/`)
|
||||
Multi-layered security:
|
||||
- **DM Pairing** — unknown senders get a pairing code, must be approved
|
||||
- **Allowlists** — per-channel user whitelists
|
||||
- **Docker Sandbox** — non-main sessions can run in per-session Docker containers
|
||||
- **Tool denylist** — block dangerous tools in sandbox mode
|
||||
- **Elevated bash** — per-session toggle for host-level access
|
||||
|
||||
### 5. Browser Control (`src/browser/`)
|
||||
- Dedicated OpenClaw-managed Chrome/Chromium instance
|
||||
- CDP (Chrome DevTools Protocol) control
|
||||
- Snapshots, actions, uploads, profiles
|
||||
- Full web automation capabilities
|
||||
|
||||
### 6. Canvas & A2UI (`src/canvas-host/`)
|
||||
- Agent-driven visual workspace
|
||||
- A2UI (Agent-to-UI) — push HTML/JS to canvas
|
||||
- Canvas eval, snapshot, reset
|
||||
- Available on macOS, iOS, Android
|
||||
|
||||
### 7. Voice System
|
||||
- **Voice Wake** — always-on speech detection
|
||||
- **Talk Mode** — continuous conversation overlay
|
||||
- ElevenLabs TTS integration
|
||||
- Available on macOS, iOS, Android
|
||||
|
||||
### 8. Companion Apps
|
||||
- **macOS app**: Menu bar, Voice Wake/PTT, WebChat, debug tools
|
||||
- **iOS node**: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
|
||||
- **Android node**: Canvas, Talk Mode, camera, screen recording, SMS
|
||||
|
||||
### 9. Session Tools (Agent-to-Agent)
|
||||
- `sessions_list` — discover active sessions
|
||||
- `sessions_history` — fetch transcript logs
|
||||
- `sessions_send` — message another session with reply-back
|
||||
|
||||
### 10. Skills Platform (`src/plugins/`, `skills/`)
|
||||
- **Bundled skills** — pre-installed capabilities
|
||||
- **Managed skills** — installed from ClawHub registry
|
||||
- **Workspace skills** — user-created in workspace
|
||||
- Install gating and UI
|
||||
- ClawHub registry for community skills
|
||||
|
||||
### 11. Automation
|
||||
- **Cron jobs** — scheduled recurring tasks
|
||||
- **Webhooks** — external trigger surface
|
||||
- **Gmail Pub/Sub** — email-triggered actions
|
||||
|
||||
### 12. Ops & Deployment
|
||||
- Docker support with compose
|
||||
- Tailscale Serve/Funnel for remote access
|
||||
- SSH tunnels with token/password auth
|
||||
- `openclaw doctor` for diagnostics
|
||||
- Nix mode for declarative config
|
||||
|
||||
---
|
||||
|
||||
## Project Structure (Simplified)
|
||||
|
||||
```
|
||||
openclaw/
|
||||
├── src/
|
||||
│ ├── agents/ # Pi agent runtime
|
||||
│ ├── gateway/ # WebSocket gateway
|
||||
│ ├── channels/ # Channel adapter base
|
||||
│ ├── whatsapp/ # WhatsApp adapter
|
||||
│ ├── telegram/ # Telegram adapter
|
||||
│ ├── slack/ # Slack adapter
|
||||
│ ├── discord/ # Discord adapter
|
||||
│ ├── signal/ # Signal adapter
|
||||
│ ├── imessage/ # iMessage adapters
|
||||
│ ├── browser/ # Browser control (CDP)
|
||||
│ ├── canvas-host/ # Canvas & A2UI
|
||||
│ ├── sessions/ # Session management
|
||||
│ ├── routing/ # Message routing
|
||||
│ ├── security/ # Auth, pairing, sandbox
|
||||
│ ├── cron/ # Scheduled jobs
|
||||
│ ├── memory/ # Memory system
|
||||
│ ├── providers/ # LLM providers
|
||||
│ ├── plugins/ # Plugin/skill system
|
||||
│ ├── media/ # Media pipeline
|
||||
│ ├── tts/ # Text-to-speech
|
||||
│ ├── web/ # Control UI + WebChat
|
||||
│ ├── wizard/ # Onboarding wizard
|
||||
│ └── cli/ # CLI commands
|
||||
├── apps/ # Companion app sources
|
||||
├── packages/ # Shared packages
|
||||
├── extensions/ # Extension channels
|
||||
├── skills/ # Bundled skills
|
||||
├── ui/ # Web UI source
|
||||
└── Swabble/ # macOS/iOS Swift source
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `openclaw onboard` | Guided setup wizard |
|
||||
| `openclaw gateway` | Start the gateway |
|
||||
| `openclaw agent --message "..."` | Chat with agent |
|
||||
| `openclaw message send` | Send to any channel |
|
||||
| `openclaw doctor` | Diagnostics & migration |
|
||||
| `openclaw pairing approve` | Approve DM pairing |
|
||||
| `openclaw update` | Update to latest version |
|
||||
| `openclaw channels login` | Link WhatsApp |
|
||||
|
||||
---
|
||||
|
||||
## Chat Commands (In-Channel)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/status` | Session status (model, tokens, cost) |
|
||||
| `/new` / `/reset` | Reset session |
|
||||
| `/compact` | Compact session context |
|
||||
| `/think <level>` | Set thinking level |
|
||||
| `/verbose on\|off` | Toggle verbose mode |
|
||||
| `/usage off\|tokens\|full` | Usage footer |
|
||||
| `/restart` | Restart gateway |
|
||||
| `/activation mention\|always` | Group activation mode |
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Gateway as control plane** — Single WebSocket server everything connects to
|
||||
2. **Multi-agent routing** — Different agents for different channels/groups
|
||||
3. **Pairing-based security** — Unknown DMs get pairing codes by default
|
||||
4. **Docker sandboxing** — Non-main sessions can be isolated
|
||||
5. **Block streaming** — Responses streamed as structured blocks
|
||||
6. **Extension-based channels** — MS Teams, Matrix, Zalo are extensions
|
||||
7. **Companion apps** — Native macOS/iOS/Android for device-level features
|
||||
8. **ClawHub** — Community skill registry
|
||||
53
docs/research/opencode-cli-upgrades.md
Normal file
53
docs/research/opencode-cli-upgrades.md
Normal file
@@ -0,0 +1,53 @@
|
||||
Looking at the OpenCode CLI doc against Aetheel's opencode_runtime.py, here are the gaps:
|
||||
|
||||
What Aetheel uses today:
|
||||
|
||||
opencode run with --model, --continue, --session, --format
|
||||
SDK mode via opencode serve API (session create + chat)
|
||||
Session persistence in SQLite
|
||||
System prompt injection via XML tags (CLI) or system param (SDK)
|
||||
Rate limit detection from error text
|
||||
Live session tracking with idle timeout
|
||||
What Aetheel is missing from the OpenCode CLI:
|
||||
|
||||
[-] --agent flag — OpenCode supports custom agents (opencode agent create/list). Aetheel has no concept of selecting different OpenCode agents per request. This would be useful for the planned agent teams feature — you could have a "programmer" agent and a "researcher" agent defined in OpenCode.
|
||||
|
||||
[-] --file / -f flag — OpenCode can attach files to a prompt (opencode run -f image.png "describe this"). Aetheel doesn't pass file attachments from chat adapters through to the runtime. Discord/Telegram/Slack all support file uploads.
|
||||
|
||||
[-] --attach flag — You can run opencode run --attach http://localhost:4096 to connect to a running server, avoiding MCP cold boot on every request. Aetheel's SDK mode connects to the server, but CLI mode spawns a fresh process each time. Using --attach in CLI mode would give you the speed of SDK mode without needing the Python SDK.
|
||||
|
||||
[-] --fork flag — Fork a session when continuing, creating a branch. Aetheel always continues sessions linearly. Forking would be useful for "what if" scenarios or spawning subagent tasks from a shared context.
|
||||
|
||||
[-] --title flag — Name sessions for easier identification. Aetheel's sessions are tracked by conversation ID but have no human-readable title.
|
||||
|
||||
--share flag — Share sessions via URL. Aetheel has no session sharing.
|
||||
|
||||
opencode session list/export/import — Full session management. Aetheel can list sessions internally but doesn't expose export/import or the full session lifecycle.
|
||||
|
||||
[-] opencode stats — Token usage and cost statistics with --days, --tools, --models filters. Aetheel tracks basic usage stats in memory but doesn't query OpenCode's built-in stats.
|
||||
|
||||
[-] opencode models — List available models from configured providers. Aetheel has no way to discover available models — you have to know the model name.
|
||||
|
||||
opencode auth management — Login/logout/list for providers. Aetheel relies on env vars for auth and has no way to manage OpenCode's credential store.
|
||||
|
||||
opencode mcp auth/logout/debug — OAuth-based MCP server auth and debugging. Aetheel can add/remove MCP servers but can't handle OAuth flows or debug MCP connections.
|
||||
|
||||
opencode github agent — GitHub Actions integration for repo automation. Aetheel has no CI/CD agent support.
|
||||
|
||||
opencode web — Built-in web UI. Aetheel has its own WebChat but doesn't leverage OpenCode's web interface.
|
||||
|
||||
opencode acp — Agent Client Protocol server. Aetheel doesn't use ACP.
|
||||
|
||||
OPENCODE_AUTO_SHARE — Auto-share sessions.
|
||||
|
||||
OPENCODE_DISABLE_AUTOCOMPACT — Control context compaction. Aetheel doesn't expose this, which could matter for long conversations.
|
||||
|
||||
OPENCODE_EXPERIMENTAL_PLAN_MODE — Plan mode for structured task execution. Aetheel doesn't use this.
|
||||
|
||||
OPENCODE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS — Control bash command timeouts. Aetheel doesn't pass this through.
|
||||
|
||||
OPENCODE_ENABLE_EXA — Exa web search tools. Aetheel doesn't expose this toggle.
|
||||
|
||||
opencode upgrade — Self-update. Aetheel has aetheel update which does git pull but doesn't update the OpenCode binary itself.
|
||||
|
||||
The most impactful gaps are --agent (for agent teams), --file (for media from chat), --attach (for faster CLI mode), --fork (for branching conversations), and opencode stats (for usage visibility).
|
||||
437
docs/research/opencode-cli.md
Normal file
437
docs/research/opencode-cli.md
Normal file
@@ -0,0 +1,437 @@
|
||||
|
||||
CLI
|
||||
|
||||
OpenCode CLI options and commands.
|
||||
|
||||
The OpenCode CLI by default starts the TUI when run without any arguments.
|
||||
Terminal window
|
||||
|
||||
opencode
|
||||
|
||||
But it also accepts commands as documented on this page. This allows you to interact with OpenCode programmatically.
|
||||
Terminal window
|
||||
|
||||
opencode run "Explain how closures work in JavaScript"
|
||||
|
||||
tui
|
||||
|
||||
Start the OpenCode terminal user interface.
|
||||
Terminal window
|
||||
|
||||
opencode [project]
|
||||
|
||||
Flags
|
||||
Flag Short Description
|
||||
--continue -c Continue the last session
|
||||
--session -s Session ID to continue
|
||||
--fork Fork the session when continuing (use with --continue or --session)
|
||||
--prompt Prompt to use
|
||||
--model -m Model to use in the form of provider/model
|
||||
--agent Agent to use
|
||||
--port Port to listen on
|
||||
--hostname Hostname to listen on
|
||||
Commands
|
||||
|
||||
The OpenCode CLI also has the following commands.
|
||||
agent
|
||||
|
||||
Manage agents for OpenCode.
|
||||
Terminal window
|
||||
|
||||
opencode agent [command]
|
||||
|
||||
attach
|
||||
|
||||
Attach a terminal to an already running OpenCode backend server started via serve or web commands.
|
||||
Terminal window
|
||||
|
||||
opencode attach [url]
|
||||
|
||||
This allows using the TUI with a remote OpenCode backend. For example:
|
||||
Terminal window
|
||||
|
||||
# Start the backend server for web/mobile access
|
||||
opencode web --port 4096 --hostname 0.0.0.0
|
||||
|
||||
# In another terminal, attach the TUI to the running backend
|
||||
opencode attach http://10.20.30.40:4096
|
||||
|
||||
Flags
|
||||
Flag Short Description
|
||||
--dir Working directory to start TUI in
|
||||
--session -s Session ID to continue
|
||||
create
|
||||
|
||||
Create a new agent with custom configuration.
|
||||
Terminal window
|
||||
|
||||
opencode agent create
|
||||
|
||||
This command will guide you through creating a new agent with a custom system prompt and tool configuration.
|
||||
list
|
||||
|
||||
List all available agents.
|
||||
Terminal window
|
||||
|
||||
opencode agent list
|
||||
|
||||
auth
|
||||
|
||||
Command to manage credentials and login for providers.
|
||||
Terminal window
|
||||
|
||||
opencode auth [command]
|
||||
|
||||
login
|
||||
|
||||
OpenCode is powered by the provider list at Models.dev, so you can use opencode auth login to configure API keys for any provider you’d like to use. This is stored in ~/.local/share/opencode/auth.json.
|
||||
Terminal window
|
||||
|
||||
opencode auth login
|
||||
|
||||
When OpenCode starts up it loads the providers from the credentials file. And if there are any keys defined in your environments or a .env file in your project.
|
||||
list
|
||||
|
||||
Lists all the authenticated providers as stored in the credentials file.
|
||||
Terminal window
|
||||
|
||||
opencode auth list
|
||||
|
||||
Or the short version.
|
||||
Terminal window
|
||||
|
||||
opencode auth ls
|
||||
|
||||
logout
|
||||
|
||||
Logs you out of a provider by clearing it from the credentials file.
|
||||
Terminal window
|
||||
|
||||
opencode auth logout
|
||||
|
||||
github
|
||||
|
||||
Manage the GitHub agent for repository automation.
|
||||
Terminal window
|
||||
|
||||
opencode github [command]
|
||||
|
||||
install
|
||||
|
||||
Install the GitHub agent in your repository.
|
||||
Terminal window
|
||||
|
||||
opencode github install
|
||||
|
||||
This sets up the necessary GitHub Actions workflow and guides you through the configuration process. Learn more.
|
||||
run
|
||||
|
||||
Run the GitHub agent. This is typically used in GitHub Actions.
|
||||
Terminal window
|
||||
|
||||
opencode github run
|
||||
|
||||
Flags
|
||||
Flag Description
|
||||
--event GitHub mock event to run the agent for
|
||||
--token GitHub personal access token
|
||||
mcp
|
||||
|
||||
Manage Model Context Protocol servers.
|
||||
Terminal window
|
||||
|
||||
opencode mcp [command]
|
||||
|
||||
add
|
||||
|
||||
Add an MCP server to your configuration.
|
||||
Terminal window
|
||||
|
||||
opencode mcp add
|
||||
|
||||
This command will guide you through adding either a local or remote MCP server.
|
||||
list
|
||||
|
||||
List all configured MCP servers and their connection status.
|
||||
Terminal window
|
||||
|
||||
opencode mcp list
|
||||
|
||||
Or use the short version.
|
||||
Terminal window
|
||||
|
||||
opencode mcp ls
|
||||
|
||||
auth
|
||||
|
||||
Authenticate with an OAuth-enabled MCP server.
|
||||
Terminal window
|
||||
|
||||
opencode mcp auth [name]
|
||||
|
||||
If you don’t provide a server name, you’ll be prompted to select from available OAuth-capable servers.
|
||||
|
||||
You can also list OAuth-capable servers and their authentication status.
|
||||
Terminal window
|
||||
|
||||
opencode mcp auth list
|
||||
|
||||
Or use the short version.
|
||||
Terminal window
|
||||
|
||||
opencode mcp auth ls
|
||||
|
||||
logout
|
||||
|
||||
Remove OAuth credentials for an MCP server.
|
||||
Terminal window
|
||||
|
||||
opencode mcp logout [name]
|
||||
|
||||
debug
|
||||
|
||||
Debug OAuth connection issues for an MCP server.
|
||||
Terminal window
|
||||
|
||||
opencode mcp debug <name>
|
||||
|
||||
models
|
||||
|
||||
List all available models from configured providers.
|
||||
Terminal window
|
||||
|
||||
opencode models [provider]
|
||||
|
||||
This command displays all models available across your configured providers in the format provider/model.
|
||||
|
||||
This is useful for figuring out the exact model name to use in your config.
|
||||
|
||||
You can optionally pass a provider ID to filter models by that provider.
|
||||
Terminal window
|
||||
|
||||
opencode models anthropic
|
||||
|
||||
Flags
|
||||
Flag Description
|
||||
--refresh Refresh the models cache from models.dev
|
||||
--verbose Use more verbose model output (includes metadata like costs)
|
||||
|
||||
Use the --refresh flag to update the cached model list. This is useful when new models have been added to a provider and you want to see them in OpenCode.
|
||||
Terminal window
|
||||
|
||||
opencode models --refresh
|
||||
|
||||
run
|
||||
|
||||
Run opencode in non-interactive mode by passing a prompt directly.
|
||||
Terminal window
|
||||
|
||||
opencode run [message..]
|
||||
|
||||
This is useful for scripting, automation, or when you want a quick answer without launching the full TUI. For example.
|
||||
Terminal window
|
||||
|
||||
opencode run Explain the use of context in Go
|
||||
|
||||
You can also attach to a running opencode serve instance to avoid MCP server cold boot times on every run:
|
||||
Terminal window
|
||||
|
||||
# Start a headless server in one terminal
|
||||
opencode serve
|
||||
|
||||
# In another terminal, run commands that attach to it
|
||||
opencode run --attach http://localhost:4096 "Explain async/await in JavaScript"
|
||||
|
||||
Flags
|
||||
Flag Short Description
|
||||
--command The command to run, use message for args
|
||||
--continue -c Continue the last session
|
||||
--session -s Session ID to continue
|
||||
--fork Fork the session when continuing (use with --continue or --session)
|
||||
--share Share the session
|
||||
--model -m Model to use in the form of provider/model
|
||||
--agent Agent to use
|
||||
--file -f File(s) to attach to message
|
||||
--format Format: default (formatted) or json (raw JSON events)
|
||||
--title Title for the session (uses truncated prompt if no value provided)
|
||||
--attach Attach to a running opencode server (e.g., http://localhost:4096)
|
||||
--port Port for the local server (defaults to random port)
|
||||
serve
|
||||
|
||||
Start a headless OpenCode server for API access. Check out the server docs for the full HTTP interface.
|
||||
Terminal window
|
||||
|
||||
opencode serve
|
||||
|
||||
This starts an HTTP server that provides API access to opencode functionality without the TUI interface. Set OPENCODE_SERVER_PASSWORD to enable HTTP basic auth (username defaults to opencode).
|
||||
Flags
|
||||
Flag Description
|
||||
--port Port to listen on
|
||||
--hostname Hostname to listen on
|
||||
--mdns Enable mDNS discovery
|
||||
--cors Additional browser origin(s) to allow CORS
|
||||
session
|
||||
|
||||
Manage OpenCode sessions.
|
||||
Terminal window
|
||||
|
||||
opencode session [command]
|
||||
|
||||
list
|
||||
|
||||
List all OpenCode sessions.
|
||||
Terminal window
|
||||
|
||||
opencode session list
|
||||
|
||||
Flags
|
||||
Flag Short Description
|
||||
--max-count -n Limit to N most recent sessions
|
||||
--format Output format: table or json (table)
|
||||
stats
|
||||
|
||||
Show token usage and cost statistics for your OpenCode sessions.
|
||||
Terminal window
|
||||
|
||||
opencode stats
|
||||
|
||||
Flags
|
||||
Flag Description
|
||||
--days Show stats for the last N days (all time)
|
||||
--tools Number of tools to show (all)
|
||||
--models Show model usage breakdown (hidden by default). Pass a number to show top N
|
||||
--project Filter by project (all projects, empty string: current project)
|
||||
export
|
||||
|
||||
Export session data as JSON.
|
||||
Terminal window
|
||||
|
||||
opencode export [sessionID]
|
||||
|
||||
If you don’t provide a session ID, you’ll be prompted to select from available sessions.
|
||||
import
|
||||
|
||||
Import session data from a JSON file or OpenCode share URL.
|
||||
Terminal window
|
||||
|
||||
opencode import <file>
|
||||
|
||||
You can import from a local file or an OpenCode share URL.
|
||||
Terminal window
|
||||
|
||||
opencode import session.json
|
||||
opencode import https://opncd.ai/s/abc123
|
||||
|
||||
web
|
||||
|
||||
Start a headless OpenCode server with a web interface.
|
||||
Terminal window
|
||||
|
||||
opencode web
|
||||
|
||||
This starts an HTTP server and opens a web browser to access OpenCode through a web interface. Set OPENCODE_SERVER_PASSWORD to enable HTTP basic auth (username defaults to opencode).
|
||||
Flags
|
||||
Flag Description
|
||||
--port Port to listen on
|
||||
--hostname Hostname to listen on
|
||||
--mdns Enable mDNS discovery
|
||||
--cors Additional browser origin(s) to allow CORS
|
||||
acp
|
||||
|
||||
Start an ACP (Agent Client Protocol) server.
|
||||
Terminal window
|
||||
|
||||
opencode acp
|
||||
|
||||
This command starts an ACP server that communicates via stdin/stdout using nd-JSON.
|
||||
Flags
|
||||
Flag Description
|
||||
--cwd Working directory
|
||||
--port Port to listen on
|
||||
--hostname Hostname to listen on
|
||||
uninstall
|
||||
|
||||
Uninstall OpenCode and remove all related files.
|
||||
Terminal window
|
||||
|
||||
opencode uninstall
|
||||
|
||||
Flags
|
||||
Flag Short Description
|
||||
--keep-config -c Keep configuration files
|
||||
--keep-data -d Keep session data and snapshots
|
||||
--dry-run Show what would be removed without removing
|
||||
--force -f Skip confirmation prompts
|
||||
upgrade
|
||||
|
||||
Updates opencode to the latest version or a specific version.
|
||||
Terminal window
|
||||
|
||||
opencode upgrade [target]
|
||||
|
||||
To upgrade to the latest version.
|
||||
Terminal window
|
||||
|
||||
opencode upgrade
|
||||
|
||||
To upgrade to a specific version.
|
||||
Terminal window
|
||||
|
||||
opencode upgrade v0.1.48
|
||||
|
||||
Flags
|
||||
Flag Short Description
|
||||
--method -m The installation method that was used; curl, npm, pnpm, bun, brew
|
||||
Global Flags
|
||||
|
||||
The opencode CLI takes the following global flags.
|
||||
Flag Short Description
|
||||
--help -h Display help
|
||||
--version -v Print version number
|
||||
--print-logs Print logs to stderr
|
||||
--log-level Log level (DEBUG, INFO, WARN, ERROR)
|
||||
Environment variables
|
||||
|
||||
OpenCode can be configured using environment variables.
|
||||
Variable Type Description
|
||||
OPENCODE_AUTO_SHARE boolean Automatically share sessions
|
||||
OPENCODE_GIT_BASH_PATH string Path to Git Bash executable on Windows
|
||||
OPENCODE_CONFIG string Path to config file
|
||||
OPENCODE_CONFIG_DIR string Path to config directory
|
||||
OPENCODE_CONFIG_CONTENT string Inline json config content
|
||||
OPENCODE_DISABLE_AUTOUPDATE boolean Disable automatic update checks
|
||||
OPENCODE_DISABLE_PRUNE boolean Disable pruning of old data
|
||||
OPENCODE_DISABLE_TERMINAL_TITLE boolean Disable automatic terminal title updates
|
||||
OPENCODE_PERMISSION string Inlined json permissions config
|
||||
OPENCODE_DISABLE_DEFAULT_PLUGINS boolean Disable default plugins
|
||||
OPENCODE_DISABLE_LSP_DOWNLOAD boolean Disable automatic LSP server downloads
|
||||
OPENCODE_ENABLE_EXPERIMENTAL_MODELS boolean Enable experimental models
|
||||
OPENCODE_DISABLE_AUTOCOMPACT boolean Disable automatic context compaction
|
||||
OPENCODE_DISABLE_CLAUDE_CODE boolean Disable reading from .claude (prompt + skills)
|
||||
OPENCODE_DISABLE_CLAUDE_CODE_PROMPT boolean Disable reading ~/.claude/CLAUDE.md
|
||||
OPENCODE_DISABLE_CLAUDE_CODE_SKILLS boolean Disable loading .claude/skills
|
||||
OPENCODE_DISABLE_MODELS_FETCH boolean Disable fetching models from remote sources
|
||||
OPENCODE_FAKE_VCS string Fake VCS provider for testing purposes
|
||||
OPENCODE_DISABLE_FILETIME_CHECK boolean Disable file time checking for optimization
|
||||
OPENCODE_CLIENT string Client identifier (defaults to cli)
|
||||
OPENCODE_ENABLE_EXA boolean Enable Exa web search tools
|
||||
OPENCODE_SERVER_PASSWORD string Enable basic auth for serve/web
|
||||
OPENCODE_SERVER_USERNAME string Override basic auth username (default opencode)
|
||||
OPENCODE_MODELS_URL string Custom URL for fetching models configuration
|
||||
Experimental
|
||||
|
||||
These environment variables enable experimental features that may change or be removed.
|
||||
Variable Type Description
|
||||
OPENCODE_EXPERIMENTAL boolean Enable all experimental features
|
||||
OPENCODE_EXPERIMENTAL_ICON_DISCOVERY boolean Enable icon discovery
|
||||
OPENCODE_EXPERIMENTAL_DISABLE_COPY_ON_SELECT boolean Disable copy on select in TUI
|
||||
OPENCODE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS number Default timeout for bash commands in ms
|
||||
OPENCODE_EXPERIMENTAL_OUTPUT_TOKEN_MAX number Max output tokens for LLM responses
|
||||
OPENCODE_EXPERIMENTAL_FILEWATCHER boolean Enable file watcher for entire dir
|
||||
OPENCODE_EXPERIMENTAL_OXFMT boolean Enable oxfmt formatter
|
||||
OPENCODE_EXPERIMENTAL_LSP_TOOL boolean Enable experimental LSP tool
|
||||
OPENCODE_EXPERIMENTAL_DISABLE_FILEWATCHER boolean Disable file watcher
|
||||
OPENCODE_EXPERIMENTAL_EXA boolean Enable experimental Exa features
|
||||
OPENCODE_EXPERIMENTAL_LSP_TY boolean Enable experimental LSP type checking
|
||||
OPENCODE_EXPERIMENTAL_MARKDOWN boolean Enable experimental markdown features
|
||||
OPENCODE_EXPERIMENTAL_PLAN_MODE boolean Enable plan mode
|
||||
251
docs/research/picoclaw.md
Normal file
251
docs/research/picoclaw.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# 🦐 PicoClaw — Architecture & How It Works
|
||||
|
||||
> **Ultra-Efficient AI Assistant in Go** — $10 hardware, 10MB RAM, 1s boot time.
|
||||
|
||||
## Overview
|
||||
|
||||
PicoClaw is an extreme-lightweight rewrite of Nanobot in Go, designed to run on the cheapest possible hardware — including $10 RISC-V SBCs with <10MB RAM. The entire project was AI-bootstrapped (95% agent-generated) through a self-bootstrapping migration from Python to Go.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | Go 1.21+ |
|
||||
| **RAM Usage** | <10MB |
|
||||
| **Startup Time** | <1s (even at 0.6GHz) |
|
||||
| **Hardware Cost** | As low as $10 |
|
||||
| **Architectures** | x86_64, ARM64, RISC-V |
|
||||
| **Binary** | Single self-contained binary |
|
||||
| **Config** | `~/.picoclaw/config.json` |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Chat Channels"]
|
||||
TG["Telegram"]
|
||||
DC["Discord"]
|
||||
QQ["QQ"]
|
||||
DT["DingTalk"]
|
||||
LINE["LINE"]
|
||||
end
|
||||
|
||||
subgraph Core["🧠 Core Agent (Single Binary)"]
|
||||
MAIN["Main Entry\n(cmd/)"]
|
||||
AGENT["Agent Loop\n(pkg/agent/)"]
|
||||
CONF["Config\n(pkg/config/)"]
|
||||
AUTH["Auth\n(pkg/auth/)"]
|
||||
PROV["Providers\n(pkg/providers/)"]
|
||||
TOOLS["Tools\n(pkg/tools/)"]
|
||||
end
|
||||
|
||||
subgraph ToolSet["🔧 Built-in Tools"]
|
||||
SHELL["Shell Exec"]
|
||||
FILE["File R/W"]
|
||||
WEB["Web Search\n(Brave / DuckDuckGo)"]
|
||||
CRON_T["Cron / Reminders"]
|
||||
SPAWN["Spawn Subagent"]
|
||||
MSG["Message Tool"]
|
||||
end
|
||||
|
||||
subgraph Workspace["💾 Workspace"]
|
||||
AGENTS_MD["AGENTS.md"]
|
||||
SOUL_MD["SOUL.md"]
|
||||
TOOLS_MD["TOOLS.md"]
|
||||
USER_MD["USER.md"]
|
||||
IDENTITY["IDENTITY.md"]
|
||||
HB["HEARTBEAT.md"]
|
||||
MEM["MEMORY.md"]
|
||||
SESSIONS["sessions/"]
|
||||
SKILLS["skills/"]
|
||||
end
|
||||
|
||||
subgraph Providers["☁️ LLM Providers"]
|
||||
GEMINI["Gemini"]
|
||||
ZHIPU["Zhipu"]
|
||||
OR["OpenRouter"]
|
||||
OA["OpenAI"]
|
||||
AN["Anthropic"]
|
||||
DS["DeepSeek"]
|
||||
GROQ["Groq\n(+ voice)"]
|
||||
end
|
||||
|
||||
Channels --> Core
|
||||
AGENT --> ToolSet
|
||||
AGENT --> Workspace
|
||||
AGENT --> Providers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Chat Channel
|
||||
participant GW as Gateway
|
||||
participant Agent as Agent Loop
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward message
|
||||
GW->>Agent: Route to agent
|
||||
Agent->>Agent: Load context (AGENTS.md, SOUL.md, USER.md)
|
||||
Agent->>LLM: Send prompt + tool defs
|
||||
LLM-->>Agent: Response
|
||||
|
||||
alt Tool Call
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Result
|
||||
Agent->>LLM: Continue
|
||||
LLM-->>Agent: Next response
|
||||
end
|
||||
|
||||
Agent->>Agent: Update memory/session
|
||||
Agent-->>GW: Return response
|
||||
GW-->>Channel: Send reply
|
||||
Channel-->>User: Display
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Heartbeat System Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Timer as Heartbeat Timer
|
||||
participant Agent as Agent
|
||||
participant HB as HEARTBEAT.md
|
||||
participant Subagent as Spawn Subagent
|
||||
participant User
|
||||
|
||||
Timer->>Agent: Trigger (every 30 min)
|
||||
Agent->>HB: Read periodic tasks
|
||||
|
||||
alt Quick Task
|
||||
Agent->>Agent: Execute directly
|
||||
Agent-->>Timer: HEARTBEAT_OK
|
||||
end
|
||||
|
||||
alt Long Task
|
||||
Agent->>Subagent: Spawn async subagent
|
||||
Agent-->>Timer: Continue to next task
|
||||
Subagent->>Subagent: Work independently
|
||||
Subagent->>User: Send result via message tool
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Agent Loop (`pkg/agent/`)
|
||||
Go-native implementation of the LLM ↔ tool execution loop:
|
||||
- Builds context from workspace identity files
|
||||
- Sends to LLM provider with tool definitions
|
||||
- Iterates on tool calls up to `max_tool_iterations` (default: 20)
|
||||
- Session history managed in `workspace/sessions/`
|
||||
|
||||
### 2. Provider System (`pkg/providers/`)
|
||||
- Gemini and Zhipu are fully tested
|
||||
- OpenRouter, Anthropic, OpenAI, DeepSeek marked "to be tested"
|
||||
- Groq for voice transcription (Whisper)
|
||||
- Each provider implements a common interface
|
||||
|
||||
### 3. Tool System (`pkg/tools/`)
|
||||
Built-in tools:
|
||||
- **read_file** / **write_file** / **list_dir** / **edit_file** / **append_file** — File operations
|
||||
- **exec** — Shell command execution (with safety guards)
|
||||
- **web_search** — Brave Search or DuckDuckGo fallback
|
||||
- **cron** — Scheduled reminders and recurring tasks
|
||||
- **spawn** — Create async subagents
|
||||
- **message** — Subagent-to-user communication
|
||||
|
||||
### 4. Security Sandbox
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
RW["restrict_to_workspace = true"]
|
||||
|
||||
RW --> RF["read_file: workspace only"]
|
||||
RW --> WF["write_file: workspace only"]
|
||||
RW --> LD["list_dir: workspace only"]
|
||||
RW --> EF["edit_file: workspace only"]
|
||||
RW --> AF["append_file: workspace only"]
|
||||
RW --> EX["exec: workspace paths only"]
|
||||
|
||||
EX --> BL["ALWAYS Blocked:"]
|
||||
BL --> RM["rm -rf"]
|
||||
BL --> FMT["format, mkfs"]
|
||||
BL --> DD["dd if="]
|
||||
BL --> SHUT["shutdown, reboot"]
|
||||
BL --> FORK["fork bomb"]
|
||||
```
|
||||
|
||||
- Workspace sandbox enabled by default
|
||||
- All tools restricted to workspace directory
|
||||
- Dangerous commands always blocked (even with sandbox off)
|
||||
- Consistent across main agent, subagents, and heartbeat tasks
|
||||
|
||||
### 5. Heartbeat System
|
||||
- Reads `HEARTBEAT.md` every 30 minutes
|
||||
- Quick tasks executed directly
|
||||
- Long tasks spawned as async subagents
|
||||
- Subagents communicate independently via message tool
|
||||
|
||||
### 6. Channel System
|
||||
- **Telegram** — Easy setup (token only)
|
||||
- **Discord** — Bot token + intents
|
||||
- **QQ** — AppID + AppSecret
|
||||
- **DingTalk** — Client credentials
|
||||
- **LINE** — Credentials + webhook URL (HTTPS required)
|
||||
|
||||
### 7. Workspace Layout
|
||||
```
|
||||
~/.picoclaw/workspace/
|
||||
├── sessions/ # Conversation history
|
||||
├── memory/ # Long-term memory (MEMORY.md)
|
||||
├── state/ # Persistent state
|
||||
├── cron/ # Scheduled jobs database
|
||||
├── skills/ # Custom skills
|
||||
├── AGENTS.md # Agent behavior guide
|
||||
├── HEARTBEAT.md # Periodic task prompts
|
||||
├── IDENTITY.md # Agent identity
|
||||
├── SOUL.md # Agent soul
|
||||
├── TOOLS.md # Tool descriptions
|
||||
└── USER.md # User preferences
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comparison Table (from README)
|
||||
|
||||
| | OpenClaw | NanoBot | **PicoClaw** |
|
||||
|---------------------|------------|-------------|-----------------------|
|
||||
| **Language** | TypeScript | Python | **Go** |
|
||||
| **RAM** | >1GB | >100MB | **<10MB** |
|
||||
| **Startup (0.8GHz)**| >500s | >30s | **<1s** |
|
||||
| **Cost** | Mac $599 | SBC ~$50 | **Any Linux, ~$10** |
|
||||
|
||||
---
|
||||
|
||||
## Deployment Targets
|
||||
|
||||
PicoClaw can run on almost any Linux device:
|
||||
- **$9.9** LicheeRV-Nano — Minimal home assistant
|
||||
- **$30-50** NanoKVM — Automated server maintenance
|
||||
- **$50-100** MaixCAM — Smart monitoring
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Go for minimal footprint** — Single binary, no runtime deps, tiny memory
|
||||
2. **AI-bootstrapped migration** — 95% of Go code generated by the AI agent itself
|
||||
3. **Web search with fallback** — Brave Search primary, DuckDuckGo fallback (free)
|
||||
4. **Heartbeat for proactive tasks** — Agent checks `HEARTBEAT.md` periodically
|
||||
5. **Subagent pattern** — Long tasks run async, don't block heartbeat
|
||||
6. **Default sandbox** — `restrict_to_workspace: true` by default
|
||||
7. **Cross-architecture** — Single binary compiles for x86, ARM64, RISC-V
|
||||
Reference in New Issue
Block a user