latest updates

This commit is contained in:
Tanmay Karande
2026-02-15 15:02:58 -05:00
parent 438bb80416
commit 41b2f9a593
24 changed files with 3883 additions and 388 deletions

3
docs/additions.txt Normal file
View File

@@ -0,0 +1,3 @@
config instead of env
edit its own files and config as well as add skills
install script starts server and adds the aetheel command

243
docs/comparison.md Normal file
View File

@@ -0,0 +1,243 @@
# ⚔️ Aetheel vs. Inspiration Repos — Comparison & Missing Features
> A detailed comparison of Aetheel with Nanobot, NanoClaw, OpenClaw, and PicoClaw — highlighting what's different, what's missing, and what can be added.
---
## Feature Comparison Matrix
| Feature | Aetheel | Nanobot | NanoClaw | OpenClaw | PicoClaw |
|---------|---------|---------|----------|----------|----------|
| **Language** | Python | Python | TypeScript | TypeScript | Go |
| **Channels** | Slack only | 9 channels | WhatsApp only | 15+ channels | 5 channels |
| **LLM Runtime** | OpenCode / Claude Code (subprocess) | LiteLLM (multi-provider) | Claude Agent SDK | Pi Agent (custom RPC) | Go-native agent |
| **Memory** | Hybrid (vector + BM25) | Simple file-based | Per-group CLAUDE.md | Workspace files | MEMORY.md + sessions |
| **Config** | `.env` file | `config.json` | Code changes (no config) | JSON5 config | `config.json` |
| **Skills** | ❌ None | ✅ Bundled + custom | ✅ Code skills (transform) | ✅ Bundled + managed + workspace | ✅ Custom skills |
| **Scheduled Tasks** | ⚠️ Action tags (remind only) | ✅ Full cron system | ✅ Task scheduler | ✅ Cron + webhooks + Gmail | ✅ Cron + heartbeat |
| **Security** | ❌ No sandbox | ⚠️ Workspace restriction | ✅ Container isolation | ✅ Docker sandbox + pairing | ✅ Workspace sandbox |
| **MCP Support** | ❌ No | ✅ Yes | ❌ No | ❌ No | ❌ No |
| **Web Search** | ❌ No | ✅ Brave Search | ✅ Via Claude tools | ✅ Browser control | ✅ Brave + DuckDuckGo |
| **Voice** | ❌ No | ✅ Via Groq Whisper | ❌ No | ✅ Voice Wake + Talk Mode | ✅ Via Groq Whisper |
| **Browser Control** | ❌ No | ❌ No | ❌ No | ✅ Full CDP control | ❌ No |
| **Companion Apps** | ❌ No | ❌ No | ❌ No | ✅ macOS + iOS + Android | ❌ No |
| **Session Management** | ✅ Thread-based (Slack) | ✅ Session-based | ✅ Per-group isolated | ✅ Full sessions + agent-to-agent | ✅ Session-based |
| **Docker Support** | ❌ No | ✅ Yes | ❌ (uses Apple Container) | ✅ Full compose setup | ✅ Yes |
| **Install Script** | ✅ Yes | ✅ pip/uv install | ✅ Claude guides setup | ✅ npm + wizard | ✅ Binary / make |
| **Identity Files** | ✅ SOUL.md, USER.md, MEMORY.md | ✅ AGENTS.md, SOUL.md, USER.md, etc. | ✅ CLAUDE.md per group | ✅ AGENTS.md, SOUL.md, USER.md, TOOLS.md | ✅ Full set (AGENTS, SOUL, IDENTITY, USER, TOOLS) |
| **Subagents** | ❌ No | ✅ Spawn subagent | ✅ Agent Swarms | ✅ sessions_send / sessions_spawn | ✅ Spawn subagent |
| **Heartbeat/Proactive** | ❌ No | ✅ Heartbeat | ❌ No | ✅ Cron + wakeups | ✅ HEARTBEAT.md |
| **Multi-provider** | ⚠️ Via OpenCode/Claude | ✅ 12+ providers | ❌ Claude only | ✅ Multi-model + failover | ✅ 7+ providers |
| **WebChat** | ❌ No | ❌ No | ❌ No | ✅ Built-in WebChat | ❌ No |
---
## What Aetheel Does Well
### ✅ Strengths
1. **Advanced Memory System** — Aetheel has the most sophisticated memory system with **hybrid search (0.7 vector + 0.3 BM25)**, local embeddings via `fastembed`, and SQLite FTS5. None of the other repos have this level of memory sophistication.
2. **Local-First Embeddings** — Zero API calls for memory search. Uses ONNX-based local model (BAAI/bge-small-en-v1.5).
3. **Dual Runtime Support** — Clean abstraction allowing switching between OpenCode and Claude Code with the same `AgentResponse` interface.
4. **Thread Isolation in Slack** — Each Slack thread gets its own AI session, providing natural conversation isolation.
5. **Action Tags** — Inline `[ACTION:remind|minutes|message]` tags are elegant for in-response scheduling.
6. **File Watching** — Memory auto-reindexes when `.md` files are edited.
---
## What Aetheel Is Missing
### 🔴 Critical Gaps (High Priority)
#### 1. Multi-Channel Support
**Current:** Slack only
**All others:** Multiple channels (3-15+)
Aetheel is locked to Slack. Adding at least **Telegram** and **Discord** would significantly increase usability. All four inspiration repos treat multi-channel as essential.
> **Recommendation:** Follow Nanobot's pattern — each channel is a module in `channels/` with a common interface. Start with Telegram (easiest — just a token).
#### 2. Skills System
**Current:** None
**Others:** All have skills/plugins
Aetheel has no way to extend agent capabilities beyond its hardcoded memory and runtime setup. A skills system would allow:
- Bundled skills (GitHub, weather, web search)
- User-created skills in workspace
- Community-contributed skills
> **Recommendation:** Create a `skills/` directory in the workspace. Skills are markdown files (`SKILL.md`) that get injected into the agent's context.
#### 3. Scheduled Tasks (Cron)
**Current:** Only `[ACTION:remind]` (one-time, simple)
**Others:** Full cron systems with persistent storage
The action tag system is clever but limited. A proper cron system would support:
- Recurring cron expressions (`0 9 * * *`)
- Interval-based scheduling
- Persistent job storage
- CLI management
> **Recommendation:** Add a `cron/` module with SQLite-backed job storage and an APScheduler-based execution engine.
#### 4. Security Sandbox
**Current:** No sandboxing
**Others:** Container isolation (NanoClaw), workspace restriction (PicoClaw), Docker sandbox (OpenClaw)
The AI runtime has unrestricted system access. At minimum, workspace-level restrictions should be added.
> **Recommendation:** Follow PicoClaw's approach — restrict tool access to workspace directory by default. Block dangerous shell commands.
---
### 🟡 Important Gaps (Medium Priority)
#### 5. Config File System (JSON instead of .env)
**Current:** `.env` file with environment variables
**Others:** JSON/JSON5 config files
A structured config file is more flexible and easier to manage than flat env vars. It can hold nested structures for channels, providers, tools, etc.
> **Recommendation:** Switch to `~/.aetheel/config.json` with a schema validator. Keep `.env` for secrets only.
#### 6. Web Search Tool
**Current:** No web search
**Others:** Brave Search, DuckDuckGo, or full browser control
The agent can't search the web. This is a significant limitation for a personal assistant.
> **Recommendation:** Add Brave Search API integration (free tier: 2000 queries/month) with DuckDuckGo as fallback.
#### 7. Subagent / Spawn Capability
**Current:** No subagents
**Others:** All have spawn/subagent systems
For long-running tasks, the main agent should be able to spawn background sub-tasks that work independently and report back.
> **Recommendation:** Add a `spawn` tool that creates a background thread/process running a separate agent session.
#### 8. Heartbeat / Proactive System
**Current:** No proactive capabilities
**Others:** Nanobot and PicoClaw have heartbeat systems
The agent only responds to messages. A heartbeat system would allow periodic check-ins, proactive notifications, and scheduled intelligence.
> **Recommendation:** Add `HEARTBEAT.md` file + periodic timer that triggers agent with heartbeat tasks.
#### 9. CLI Interface
**Current:** Only `python main.py` with flags
**Others:** Full CLI with subcommands (`nanobot agent`, `picoclaw cron`, etc.)
> **Recommendation:** Add a CLI using `click` or `argparse` with subcommands: `aetheel chat`, `aetheel status`, `aetheel cron`, etc.
#### 10. Tool System
**Current:** No explicit tool system (AI handles everything via runtime)
**Others:** Shell exec, file R/W, web search, spawn, message, etc.
Aetheel delegates all tool use to the AI runtime (OpenCode/Claude Code). While this works, having explicit tools gives more control and allows sandboxing.
> **Recommendation:** Define a tool interface and implement core tools (file ops, shell, web search) that run through the aetheel process with sandboxing.
---
### 🟢 Nice-to-Have (Lower Priority)
#### 11. MCP Server Support
Only Nanobot supports MCP. Would allow connecting external tool servers.
#### 12. Multi-Provider Support
Currently relies on OpenCode/Claude Code for provider handling. Direct multi-provider support (like Nanobot's 12+ providers via LiteLLM) would add flexibility.
#### 13. Docker / Container Support
No Docker compose or containerized deployment option.
#### 14. Agent-to-Agent Communication
OpenClaw's `sessions_send` allows agents to message each other. Useful for multi-agent workflows.
#### 15. Gateway Architecture
Moving from a direct Slack adapter to a gateway pattern would make adding channels much easier.
#### 16. Onboarding Wizard
OpenClaw's `onboard --install-daemon` provides a guided setup. Aetheel's install script is good but could be more interactive.
#### 17. Voice Support
Voice Wake / Talk Mode (OpenClaw) or Whisper transcription (Nanobot, PicoClaw).
#### 18. WebChat Interface
A browser-based chat UI connected to the gateway.
#### 19. TOOLS.md File
A `TOOLS.md` file describing available tools to the agent, used by PicoClaw and OpenClaw.
#### 20. Self-Modification
From `additions.txt`: "edit its own files and config as well as add skills" — the agent should be able to modify its own configuration and add new skills.
---
## Architecture Comparison
```mermaid
graph LR
subgraph Aetheel["⚔️ Aetheel (Current)"]
A_SLACK["Slack\n(only channel)"]
A_MAIN["main.py"]
A_MEM["Memory\n(hybrid search)"]
A_RT["OpenCode / Claude\n(subprocess)"]
end
subgraph Target["🎯 Target Architecture"]
T_CHAN["Multi-Channel\nGateway"]
T_CORE["Core Agent\n+ Tool System"]
T_MEM["Memory\n(hybrid search)"]
T_SK["Skills"]
T_CRON["Cron"]
T_PROV["Multi-Provider"]
T_SEC["Security\nSandbox"]
end
A_SLACK --> A_MAIN
A_MAIN --> A_MEM
A_MAIN --> A_RT
T_CHAN --> T_CORE
T_CORE --> T_MEM
T_CORE --> T_SK
T_CORE --> T_CRON
T_CORE --> T_PROV
T_CORE --> T_SEC
```
---
## Prioritized Roadmap Suggestion
Based on the analysis, here's a suggested implementation order:
### Phase 1: Foundation (Essentials)
1. **Config system** — Switch from `.env` to JSON config
2. **Skills system**`skills/` directory with `SKILL.md` loading
3. **Tool system** — Core tools (shell, file, web search) with sandbox
4. **Security sandbox** — Workspace-restricted tool execution
### Phase 2: Channels & Scheduling
5. **Channel abstraction** — Extract adapter interface from Slack adapter
6. **Telegram channel** — First new channel
7. **Cron system** — Full scheduled task management
8. **CLI** — Proper CLI with subcommands
### Phase 3: Advanced Features
9. **Heartbeat** — Proactive agent capabilities
10. **Subagents** — Spawn background tasks
11. **Discord channel** — Second new channel
12. **Web search** — Brave Search + DuckDuckGo
### Phase 4: Polish
13. **Self-modification** — Agent can edit config and add skills
14. **Docker support** — Dockerfile + compose
15. **MCP support** — External tool servers
16. **WebChat** — Browser-based chat UI

207
docs/nanobot.md Normal file
View File

@@ -0,0 +1,207 @@
# 🐈 Nanobot — Architecture & How It Works
> **Ultra-Lightweight Personal AI Assistant** — ~4,000 lines of Python, 99% smaller than OpenClaw.
## Overview
Nanobot is a minimalist personal AI assistant written in Python that focuses on delivering core agent functionality with the smallest possible codebase. It uses LiteLLM for multi-provider LLM routing, supports 9+ chat channels, and includes memory, skills, scheduled tasks, and MCP tool integration.
| Attribute | Value |
|-----------|-------|
| **Language** | Python 3.11+ |
| **Lines of Code** | ~4,000 (core agent) |
| **Config** | `~/.nanobot/config.json` |
| **Package** | `pip install nanobot-ai` |
| **LLM Routing** | LiteLLM (multi-provider) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Chat Channels"]
TG["Telegram"]
DC["Discord"]
WA["WhatsApp"]
FS["Feishu"]
MC["Mochat"]
DT["DingTalk"]
SL["Slack"]
EM["Email"]
QQ["QQ"]
end
subgraph Gateway["🌐 Gateway (nanobot gateway)"]
CH["Channel Manager"]
MQ["Message Queue"]
end
subgraph Agent["🧠 Core Agent"]
LOOP["Agent Loop\n(loop.py)"]
CTX["Context Builder\n(context.py)"]
MEM["Memory System\n(memory.py)"]
SK["Skills Loader\n(skills.py)"]
SA["Subagent\n(subagent.py)"]
end
subgraph Tools["🔧 Built-in Tools"]
SHELL["Shell Exec"]
FILE["File R/W/Edit"]
WEB["Web Search"]
SPAWN["Spawn Subagent"]
MCP["MCP Servers"]
end
subgraph Providers["☁️ LLM Providers (LiteLLM)"]
OR["OpenRouter"]
AN["Anthropic"]
OA["OpenAI"]
DS["DeepSeek"]
GR["Groq"]
GE["Gemini"]
VL["vLLM (local)"]
end
Channels --> Gateway
Gateway --> Agent
CTX --> LOOP
MEM --> CTX
SK --> CTX
LOOP --> Tools
LOOP --> Providers
SA --> LOOP
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Chat Channel
participant GW as Gateway
participant Agent as Agent Loop
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward message
GW->>Agent: Route to agent
Agent->>Agent: Build context (memory, skills, identity)
Agent->>LLM: Send prompt + tools
LLM-->>Agent: Response (text or tool call)
alt Tool Call
Agent->>Tools: Execute tool
Tools-->>Agent: Tool result
Agent->>LLM: Send tool result
LLM-->>Agent: Final response
end
Agent->>Agent: Update memory
Agent-->>GW: Return response
GW-->>Channel: Send reply
Channel-->>User: Display response
```
---
## Key Components
### 1. Agent Loop (`agent/loop.py`)
The core loop that manages the LLM ↔ tool execution cycle:
- Builds a prompt using context (memory, skills, identity files)
- Sends to LLM via LiteLLM
- If LLM returns a tool call → executes it → sends result back
- Continues until LLM returns a text response (no more tool calls)
### 2. Context Builder (`agent/context.py`)
Assembles the system prompt from:
- **Identity files**: `AGENTS.md`, `SOUL.md`, `USER.md`, `TOOLS.md`, `IDENTITY.md`
- **Memory**: Persistent `MEMORY.md` with recall
- **Skills**: Loaded from `~/.nanobot/workspace/skills/`
- **Conversation history**: Session-based context
### 3. Memory System (`agent/memory.py`)
- Persistent memory stored in `MEMORY.md` in the workspace
- Agent can read and write memories
- Survives across sessions
### 4. Provider Registry (`providers/registry.py`)
- Single-source-of-truth for all LLM providers
- Adding a new provider = 2 steps (add `ProviderSpec` + config field)
- Auto-prefixes model names for LiteLLM routing
- Supports 12+ providers including local vLLM
### 5. Channel System (`channels/`)
- 9 chat platforms supported (Telegram, Discord, WhatsApp, Feishu, Mochat, DingTalk, Slack, Email, QQ)
- Each channel handles auth, message parsing, and response delivery
- Allowlist-based security (`allowFrom`)
- Started via `nanobot gateway`
### 6. Skills (`skills/`)
- Bundled skills: GitHub, weather, tmux, etc.
- Custom skills loaded from workspace
- Skills are injected into the agent's context
### 7. Scheduled Tasks (Cron)
- Add jobs via `nanobot cron add`
- Supports cron expressions and interval-based scheduling
- Jobs stored persistently
### 8. MCP Integration
- Supports Model Context Protocol servers
- Stdio and HTTP transport modes
- Compatible with Claude Desktop / Cursor MCP configs
- Tools auto-discovered and registered at startup
---
## Project Structure
```
nanobot/
├── agent/ # 🧠 Core agent logic
│ ├── loop.py # Agent loop (LLM ↔ tool execution)
│ ├── context.py # Prompt builder
│ ├── memory.py # Persistent memory
│ ├── skills.py # Skills loader
│ ├── subagent.py # Background task execution
│ └── tools/ # Built-in tools (incl. spawn)
├── skills/ # 🎯 Bundled skills (github, weather, tmux...)
├── channels/ # 📱 Chat channel integrations
├── providers/ # ☁️ LLM provider registry
├── config/ # ⚙️ Configuration schema
├── cron/ # ⏰ Scheduled tasks
├── heartbeat/ # 💓 Heartbeat system
├── session/ # 📝 Session management
├── bus/ # 📨 Internal event bus
├── cli/ # 🖥️ CLI commands
└── utils/ # 🔧 Utilities
```
---
## CLI Commands
| Command | Description |
|---------|-------------|
| `nanobot onboard` | Initialize config & workspace |
| `nanobot agent -m "..."` | Chat with the agent |
| `nanobot agent` | Interactive chat mode |
| `nanobot gateway` | Start all channels |
| `nanobot status` | Show status |
| `nanobot cron add/list/remove` | Manage scheduled tasks |
| `nanobot channels login` | Link WhatsApp device |
---
## Key Design Decisions
1. **LiteLLM for provider abstraction** — One interface for all LLM providers
2. **JSON config over env vars** — Single `config.json` file for all settings
3. **Skills-based extensibility** — Modular skill system for adding capabilities
4. **Provider Registry pattern** — Adding providers is 2-step, zero if-elif chains
5. **Agent social network** — Can join MoltBook, ClawdChat communities

214
docs/nanoclaw.md Normal file
View File

@@ -0,0 +1,214 @@
# 🦀 NanoClaw — Architecture & How It Works
> **Minimal, Security-First Personal AI Assistant** — built on Claude Agent SDK with container isolation.
## Overview
NanoClaw is a minimalist personal AI assistant that prioritizes **security through container isolation** and **understandability through small codebase size**. It runs on Claude Agent SDK (Claude Code) and uses WhatsApp as its primary channel. Each group chat runs in its own isolated Linux container.
| Attribute | Value |
|-----------|-------|
| **Language** | TypeScript (Node.js 20+) |
| **Codebase Size** | ~34.9k tokens (~17% of Claude context window) |
| **Config** | No config files — code changes only |
| **AI Runtime** | Claude Agent SDK (Claude Code) |
| **Primary Channel** | WhatsApp (Baileys) |
| **Isolation** | Apple Container (macOS) / Docker (Linux) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph WhatsApp["📱 WhatsApp"]
WA["WhatsApp Client\n(Baileys)"]
end
subgraph Core["🧠 Core Process (Single Node.js)"]
IDX["Orchestrator\n(index.ts)"]
DB["SQLite DB\n(db.ts)"]
GQ["Group Queue\n(group-queue.ts)"]
TS["Task Scheduler\n(task-scheduler.ts)"]
IPC["IPC Watcher\n(ipc.ts)"]
RT["Router\n(router.ts)"]
end
subgraph Containers["🐳 Isolated Containers"]
C1["Container 1\nGroup A\n(CLAUDE.md)"]
C2["Container 2\nGroup B\n(CLAUDE.md)"]
C3["Container 3\nMain Channel\n(CLAUDE.md)"]
end
subgraph Memory["💾 Per-Group Memory"]
M1["groups/A/CLAUDE.md"]
M2["groups/B/CLAUDE.md"]
M3["groups/main/CLAUDE.md"]
end
WA --> IDX
IDX --> DB
IDX --> GQ
GQ --> Containers
TS --> Containers
Containers --> IPC
IPC --> RT
RT --> WA
C1 --- M1
C2 --- M2
C3 --- M3
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant WA as WhatsApp (Baileys)
participant IDX as Orchestrator
participant DB as SQLite
participant GQ as Group Queue
participant Container as Container (Claude SDK)
participant IPC as IPC Watcher
User->>WA: Send message with @Andy
WA->>IDX: New message event
IDX->>DB: Store message
IDX->>GQ: Enqueue (per-group, concurrency limited)
GQ->>Container: Spawn Claude agent container
Note over Container: Mounts only group's filesystem
Note over Container: Reads group-specific CLAUDE.md
Container->>Container: Claude processes with tools
Container->>IPC: Write response to filesystem
IPC->>IDX: Detect new response file
IDX->>WA: Send reply
WA->>User: Display response
```
---
## Key Components
### 1. Orchestrator (`src/index.ts`)
The single entry point that manages:
- WhatsApp connection state
- Message polling loop
- Agent invocation decisions
- State management for groups and sessions
### 2. WhatsApp Channel (`src/channels/whatsapp.ts`)
- Uses **Baileys** library for WhatsApp Web connection
- Handles authentication via QR code scan
- Manages send/receive of messages
- Supports media messages
### 3. Container Runner (`src/container-runner.ts`)
The security core of NanoClaw:
- Spawns **streaming Claude Agent SDK** containers
- Each group runs in its own Linux container
- **Apple Container** on macOS, **Docker** on Linux
- Only explicitly mounted directories are accessible
- Bash commands run INSIDE the container, not on host
### 4. SQLite Database (`src/db.ts`)
- Stores messages, groups, sessions, and state
- Per-group message history
- Session continuity tracking
### 5. Group Queue (`src/group-queue.ts`)
- Per-group message queue
- Global concurrency limit
- Ensures one agent invocation per group at a time
### 6. IPC System (`src/ipc.ts`)
- Filesystem-based inter-process communication
- Container writes response to mounted directory
- IPC watcher detects and processes response files
- Handles task results from scheduled jobs
### 7. Task Scheduler (`src/task-scheduler.ts`)
- Recurring jobs that run Claude in containers
- Jobs can message the user back
- Managed from the main channel (self-chat)
### 8. Router (`src/router.ts`)
- Formats outbound messages
- Routes responses to correct WhatsApp recipient
### 9. Per-Group Memory (`groups/*/CLAUDE.md`)
- Each group has its own `CLAUDE.md` memory file
- Mounted into the group's container
- Complete filesystem isolation between groups
---
## Security Model
```mermaid
graph LR
subgraph Host["🖥️ Host System"]
NanoClaw["NanoClaw Process"]
end
subgraph Container1["🐳 Container (Group A)"]
Agent1["Claude Agent"]
FS1["Mounted: groups/A/"]
end
subgraph Container2["🐳 Container (Group B)"]
Agent2["Claude Agent"]
FS2["Mounted: groups/B/"]
end
NanoClaw -->|"Spawns"| Container1
NanoClaw -->|"Spawns"| Container2
style Container1 fill:#e8f5e9
style Container2 fill:#e8f5e9
```
- **OS-level isolation** vs. application-level permission checks
- Agents can only see what's explicitly mounted
- Bash commands run in container, not on host
- No shared memory between groups
---
## Philosophy & Design Decisions
1. **Small enough to understand** — Read the entire codebase in ~8 minutes
2. **Secure by isolation** — Linux containers, not permission checks
3. **Built for one user** — Not a framework, working software for personal use
4. **Customization = code changes** — No config sprawl, modify the code directly
5. **AI-native** — Claude Code handles setup (`/setup`), debugging, customization
6. **Skills over features** — Don't add features to codebase, add skills that transform forks
7. **Best harness, best model** — Claude Agent SDK gives Claude Code superpowers
---
## Agent Swarms (Unique Feature)
NanoClaw is the **first personal AI assistant** to support Agent Swarms:
- Spin up teams of specialized agents
- Agents collaborate within your chat
- Each agent runs in its own container
---
## Usage
```bash
# Setup (Claude Code handles everything)
git clone https://github.com/gavrielc/nanoclaw.git
cd nanoclaw
claude
# Then run /setup
# Talk to your assistant
@Andy send me a daily summary every morning at 9am
@Andy review the git history and update the README
```
Trigger word: `@Andy` (customizable via code changes)

291
docs/openclaw.md Normal file
View File

@@ -0,0 +1,291 @@
# 🦞 OpenClaw — Architecture & How It Works
> **Full-Featured Personal AI Assistant** — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.
## Overview
OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.
| Attribute | Value |
|-----------|-------|
| **Language** | TypeScript (Node.js ≥22) |
| **Codebase Size** | 430k+ lines, 50+ source modules |
| **Config** | `~/.openclaw/openclaw.json` (JSON5) |
| **AI Runtime** | Pi Agent (custom RPC), multi-model |
| **Channels** | 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.) |
| **Package Mgr** | pnpm (monorepo) |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Messaging Channels (15+)"]
WA["WhatsApp\n(Baileys)"]
TG["Telegram\n(grammY)"]
SL["Slack\n(Bolt)"]
DC["Discord\n(discord.js)"]
GC["Google Chat"]
SIG["Signal\n(signal-cli)"]
BB["BlueBubbles\n(iMessage)"]
IM["iMessage\n(legacy)"]
MST["MS Teams"]
MTX["Matrix"]
ZL["Zalo"]
WC["WebChat"]
end
subgraph Gateway["🌐 Gateway (Control Plane)"]
WS["WebSocket Server\nws://127.0.0.1:18789"]
SES["Session Manager"]
RTE["Channel Router"]
PRES["Presence System"]
Q["Message Queue"]
CFG["Config Manager"]
AUTH["Auth / Pairing"]
end
subgraph Agent["🧠 Pi Agent (RPC)"]
AGENT["Agent Runtime"]
TOOLS["Tool Registry"]
STREAM["Block Streaming"]
PROV["Provider Router\n(multi-model)"]
end
subgraph Apps["📲 Companion Apps"]
MAC["macOS Menu Bar"]
IOS["iOS Node"]
ANDR["Android Node"]
end
subgraph ToolSet["🔧 Tools & Automation"]
BROWSER["Browser Control\n(CDP/Chromium)"]
CANVAS["Live Canvas\n(A2UI)"]
CRON["Cron Jobs"]
WEBHOOK["Webhooks"]
GMAIL["Gmail Pub/Sub"]
NODES["Nodes\n(camera, screen, location)"]
SKILLS_T["Skills Platform"]
SESS_T["Session Tools\n(agent-to-agent)"]
end
subgraph Workspace["💾 Workspace"]
AGENTS_MD["AGENTS.md"]
SOUL_MD["SOUL.md"]
USER_MD["USER.md"]
TOOLS_MD["TOOLS.md"]
SKILLS_W["Skills/"]
end
Channels --> Gateway
Apps --> Gateway
Gateway --> Agent
Agent --> ToolSet
Agent --> Workspace
Agent --> PROV
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Channel (WA/TG/Slack/etc.)
participant GW as Gateway (WS)
participant Session as Session Manager
participant Agent as Pi Agent (RPC)
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward via channel adapter
GW->>Session: Route to session (main/group)
GW->>GW: Check auth (pairing/allowlist)
Session->>Agent: Invoke agent (RPC)
Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
Agent->>LLM: Stream request (with tool definitions)
loop Tool Use Loop
LLM-->>Agent: Tool call (block stream)
Agent->>Tools: Execute tool
Tools-->>Agent: Tool result
Agent->>LLM: Continue with result
end
LLM-->>Agent: Final response (block stream)
Agent-->>Session: Return response
Session->>GW: Add to outbound queue
GW->>GW: Chunk if needed (per-channel limits)
GW->>Channel: Send chunked replies
Channel->>User: Display response
Note over GW: Typing indicators, presence updates
```
---
## Key Components
### 1. Gateway (`src/gateway/`)
The central control plane — everything connects through it:
- **WebSocket server** on `ws://127.0.0.1:18789`
- Session management (main, group, per-channel)
- Multi-agent routing (different agents for different channels)
- Presence tracking and typing indicators
- Config management and hot-reload
- Health checks, doctor diagnostics
### 2. Pi Agent (`src/agents/`)
Custom RPC-based agent runtime:
- Tool streaming and block streaming
- Multi-model support with failover
- Session pruning for long conversations
- Usage tracking (tokens, cost)
- Thinking level control (off → xhigh)
### 3. Channel System (`src/channels/` + per-channel dirs)
15+ channel adapters, each with:
- Auth handling (pairing codes, allowlists, OAuth)
- Message format conversion
- Media pipeline (images, audio, video)
- Group routing with mention gating
- Per-channel chunking (character limits differ)
### 4. Security System (`src/security/`)
Multi-layered security:
- **DM Pairing** — unknown senders get a pairing code, must be approved
- **Allowlists** — per-channel user whitelists
- **Docker Sandbox** — non-main sessions can run in per-session Docker containers
- **Tool denylist** — block dangerous tools in sandbox mode
- **Elevated bash** — per-session toggle for host-level access
### 5. Browser Control (`src/browser/`)
- Dedicated OpenClaw-managed Chrome/Chromium instance
- CDP (Chrome DevTools Protocol) control
- Snapshots, actions, uploads, profiles
- Full web automation capabilities
### 6. Canvas & A2UI (`src/canvas-host/`)
- Agent-driven visual workspace
- A2UI (Agent-to-UI) — push HTML/JS to canvas
- Canvas eval, snapshot, reset
- Available on macOS, iOS, Android
### 7. Voice System
- **Voice Wake** — always-on speech detection
- **Talk Mode** — continuous conversation overlay
- ElevenLabs TTS integration
- Available on macOS, iOS, Android
### 8. Companion Apps
- **macOS app**: Menu bar, Voice Wake/PTT, WebChat, debug tools
- **iOS node**: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
- **Android node**: Canvas, Talk Mode, camera, screen recording, SMS
### 9. Session Tools (Agent-to-Agent)
- `sessions_list` — discover active sessions
- `sessions_history` — fetch transcript logs
- `sessions_send` — message another session with reply-back
### 10. Skills Platform (`src/plugins/`, `skills/`)
- **Bundled skills** — pre-installed capabilities
- **Managed skills** — installed from ClawHub registry
- **Workspace skills** — user-created in workspace
- Install gating and UI
- ClawHub registry for community skills
### 11. Automation
- **Cron jobs** — scheduled recurring tasks
- **Webhooks** — external trigger surface
- **Gmail Pub/Sub** — email-triggered actions
### 12. Ops & Deployment
- Docker support with compose
- Tailscale Serve/Funnel for remote access
- SSH tunnels with token/password auth
- `openclaw doctor` for diagnostics
- Nix mode for declarative config
---
## Project Structure (Simplified)
```
openclaw/
├── src/
│ ├── agents/ # Pi agent runtime
│ ├── gateway/ # WebSocket gateway
│ ├── channels/ # Channel adapter base
│ ├── whatsapp/ # WhatsApp adapter
│ ├── telegram/ # Telegram adapter
│ ├── slack/ # Slack adapter
│ ├── discord/ # Discord adapter
│ ├── signal/ # Signal adapter
│ ├── imessage/ # iMessage adapters
│ ├── browser/ # Browser control (CDP)
│ ├── canvas-host/ # Canvas & A2UI
│ ├── sessions/ # Session management
│ ├── routing/ # Message routing
│ ├── security/ # Auth, pairing, sandbox
│ ├── cron/ # Scheduled jobs
│ ├── memory/ # Memory system
│ ├── providers/ # LLM providers
│ ├── plugins/ # Plugin/skill system
│ ├── media/ # Media pipeline
│ ├── tts/ # Text-to-speech
│ ├── web/ # Control UI + WebChat
│ ├── wizard/ # Onboarding wizard
│ └── cli/ # CLI commands
├── apps/ # Companion app sources
├── packages/ # Shared packages
├── extensions/ # Extension channels
├── skills/ # Bundled skills
├── ui/ # Web UI source
└── Swabble/ # macOS/iOS Swift source
```
---
## CLI Commands
| Command | Description |
|---------|-------------|
| `openclaw onboard` | Guided setup wizard |
| `openclaw gateway` | Start the gateway |
| `openclaw agent --message "..."` | Chat with agent |
| `openclaw message send` | Send to any channel |
| `openclaw doctor` | Diagnostics & migration |
| `openclaw pairing approve` | Approve DM pairing |
| `openclaw update` | Update to latest version |
| `openclaw channels login` | Link WhatsApp |
---
## Chat Commands (In-Channel)
| Command | Description |
|---------|-------------|
| `/status` | Session status (model, tokens, cost) |
| `/new` / `/reset` | Reset session |
| `/compact` | Compact session context |
| `/think <level>` | Set thinking level |
| `/verbose on\|off` | Toggle verbose mode |
| `/usage off\|tokens\|full` | Usage footer |
| `/restart` | Restart gateway |
| `/activation mention\|always` | Group activation mode |
---
## Key Design Decisions
1. **Gateway as control plane** — Single WebSocket server everything connects to
2. **Multi-agent routing** — Different agents for different channels/groups
3. **Pairing-based security** — Unknown DMs get pairing codes by default
4. **Docker sandboxing** — Non-main sessions can be isolated
5. **Block streaming** — Responses streamed as structured blocks
6. **Extension-based channels** — MS Teams, Matrix, Zalo are extensions
7. **Companion apps** — Native macOS/iOS/Android for device-level features
8. **ClawHub** — Community skill registry

251
docs/picoclaw.md Normal file
View File

@@ -0,0 +1,251 @@
# 🦐 PicoClaw — Architecture & How It Works
> **Ultra-Efficient AI Assistant in Go** — $10 hardware, 10MB RAM, 1s boot time.
## Overview
PicoClaw is an extreme-lightweight rewrite of Nanobot in Go, designed to run on the cheapest possible hardware — including $10 RISC-V SBCs with <10MB RAM. The entire project was AI-bootstrapped (95% agent-generated) through a self-bootstrapping migration from Python to Go.
| Attribute | Value |
|-----------|-------|
| **Language** | Go 1.21+ |
| **RAM Usage** | <10MB |
| **Startup Time** | <1s (even at 0.6GHz) |
| **Hardware Cost** | As low as $10 |
| **Architectures** | x86_64, ARM64, RISC-V |
| **Binary** | Single self-contained binary |
| **Config** | `~/.picoclaw/config.json` |
---
## Architecture Flowchart
```mermaid
graph TB
subgraph Channels["📱 Chat Channels"]
TG["Telegram"]
DC["Discord"]
QQ["QQ"]
DT["DingTalk"]
LINE["LINE"]
end
subgraph Core["🧠 Core Agent (Single Binary)"]
MAIN["Main Entry\n(cmd/)"]
AGENT["Agent Loop\n(pkg/agent/)"]
CONF["Config\n(pkg/config/)"]
AUTH["Auth\n(pkg/auth/)"]
PROV["Providers\n(pkg/providers/)"]
TOOLS["Tools\n(pkg/tools/)"]
end
subgraph ToolSet["🔧 Built-in Tools"]
SHELL["Shell Exec"]
FILE["File R/W"]
WEB["Web Search\n(Brave / DuckDuckGo)"]
CRON_T["Cron / Reminders"]
SPAWN["Spawn Subagent"]
MSG["Message Tool"]
end
subgraph Workspace["💾 Workspace"]
AGENTS_MD["AGENTS.md"]
SOUL_MD["SOUL.md"]
TOOLS_MD["TOOLS.md"]
USER_MD["USER.md"]
IDENTITY["IDENTITY.md"]
HB["HEARTBEAT.md"]
MEM["MEMORY.md"]
SESSIONS["sessions/"]
SKILLS["skills/"]
end
subgraph Providers["☁️ LLM Providers"]
GEMINI["Gemini"]
ZHIPU["Zhipu"]
OR["OpenRouter"]
OA["OpenAI"]
AN["Anthropic"]
DS["DeepSeek"]
GROQ["Groq\n(+ voice)"]
end
Channels --> Core
AGENT --> ToolSet
AGENT --> Workspace
AGENT --> Providers
```
---
## Message Flow
```mermaid
sequenceDiagram
participant User
participant Channel as Chat Channel
participant GW as Gateway
participant Agent as Agent Loop
participant LLM as LLM Provider
participant Tools as Tools
User->>Channel: Send message
Channel->>GW: Forward message
GW->>Agent: Route to agent
Agent->>Agent: Load context (AGENTS.md, SOUL.md, USER.md)
Agent->>LLM: Send prompt + tool defs
LLM-->>Agent: Response
alt Tool Call
Agent->>Tools: Execute tool
Tools-->>Agent: Result
Agent->>LLM: Continue
LLM-->>Agent: Next response
end
Agent->>Agent: Update memory/session
Agent-->>GW: Return response
GW-->>Channel: Send reply
Channel-->>User: Display
```
---
## Heartbeat System Flow
```mermaid
sequenceDiagram
participant Timer as Heartbeat Timer
participant Agent as Agent
participant HB as HEARTBEAT.md
participant Subagent as Spawn Subagent
participant User
Timer->>Agent: Trigger (every 30 min)
Agent->>HB: Read periodic tasks
alt Quick Task
Agent->>Agent: Execute directly
Agent-->>Timer: HEARTBEAT_OK
end
alt Long Task
Agent->>Subagent: Spawn async subagent
Agent-->>Timer: Continue to next task
Subagent->>Subagent: Work independently
Subagent->>User: Send result via message tool
end
```
---
## Key Components
### 1. Agent Loop (`pkg/agent/`)
Go-native implementation of the LLM ↔ tool execution loop:
- Builds context from workspace identity files
- Sends to LLM provider with tool definitions
- Iterates on tool calls up to `max_tool_iterations` (default: 20)
- Session history managed in `workspace/sessions/`
### 2. Provider System (`pkg/providers/`)
- Gemini and Zhipu are fully tested
- OpenRouter, Anthropic, OpenAI, DeepSeek marked "to be tested"
- Groq for voice transcription (Whisper)
- Each provider implements a common interface
### 3. Tool System (`pkg/tools/`)
Built-in tools:
- **read_file** / **write_file** / **list_dir** / **edit_file** / **append_file** — File operations
- **exec** — Shell command execution (with safety guards)
- **web_search** — Brave Search or DuckDuckGo fallback
- **cron** — Scheduled reminders and recurring tasks
- **spawn** — Create async subagents
- **message** — Subagent-to-user communication
### 4. Security Sandbox
```mermaid
graph TD
RW["restrict_to_workspace = true"]
RW --> RF["read_file: workspace only"]
RW --> WF["write_file: workspace only"]
RW --> LD["list_dir: workspace only"]
RW --> EF["edit_file: workspace only"]
RW --> AF["append_file: workspace only"]
RW --> EX["exec: workspace paths only"]
EX --> BL["ALWAYS Blocked:"]
BL --> RM["rm -rf"]
BL --> FMT["format, mkfs"]
BL --> DD["dd if="]
BL --> SHUT["shutdown, reboot"]
BL --> FORK["fork bomb"]
```
- Workspace sandbox enabled by default
- All tools restricted to workspace directory
- Dangerous commands always blocked (even with sandbox off)
- Consistent across main agent, subagents, and heartbeat tasks
### 5. Heartbeat System
- Reads `HEARTBEAT.md` every 30 minutes
- Quick tasks executed directly
- Long tasks spawned as async subagents
- Subagents communicate independently via message tool
### 6. Channel System
- **Telegram** — Easy setup (token only)
- **Discord** — Bot token + intents
- **QQ** — AppID + AppSecret
- **DingTalk** — Client credentials
- **LINE** — Credentials + webhook URL (HTTPS required)
### 7. Workspace Layout
```
~/.picoclaw/workspace/
├── sessions/ # Conversation history
├── memory/ # Long-term memory (MEMORY.md)
├── state/ # Persistent state
├── cron/ # Scheduled jobs database
├── skills/ # Custom skills
├── AGENTS.md # Agent behavior guide
├── HEARTBEAT.md # Periodic task prompts
├── IDENTITY.md # Agent identity
├── SOUL.md # Agent soul
├── TOOLS.md # Tool descriptions
└── USER.md # User preferences
```
---
## Comparison Table (from README)
| | OpenClaw | NanoBot | **PicoClaw** |
|---------------------|------------|-------------|-----------------------|
| **Language** | TypeScript | Python | **Go** |
| **RAM** | >1GB | >100MB | **<10MB** |
| **Startup (0.8GHz)**| >500s | >30s | **<1s** |
| **Cost** | Mac $599 | SBC ~$50 | **Any Linux, ~$10** |
---
## Deployment Targets
PicoClaw can run on almost any Linux device:
- **$9.9** LicheeRV-Nano — Minimal home assistant
- **$30-50** NanoKVM — Automated server maintenance
- **$50-100** MaixCAM — Smart monitoring
---
## Key Design Decisions
1. **Go for minimal footprint** — Single binary, no runtime deps, tiny memory
2. **AI-bootstrapped migration** — 95% of Go code generated by the AI agent itself
3. **Web search with fallback** — Brave Search primary, DuckDuckGo fallback (free)
4. **Heartbeat for proactive tasks** — Agent checks `HEARTBEAT.md` periodically
5. **Subagent pattern** — Long tasks run async, don't block heartbeat
6. **Default sandbox**`restrict_to_workspace: true` by default
7. **Cross-architecture** — Single binary compiles for x86, ARM64, RISC-V