latest updates
This commit is contained in:
3
docs/additions.txt
Normal file
3
docs/additions.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
config instead of env
|
||||
edit its own files and config as well as add skills
|
||||
install script starts server and adds the aetheel command
|
||||
243
docs/comparison.md
Normal file
243
docs/comparison.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# ⚔️ Aetheel vs. Inspiration Repos — Comparison & Missing Features
|
||||
|
||||
> A detailed comparison of Aetheel with Nanobot, NanoClaw, OpenClaw, and PicoClaw — highlighting what's different, what's missing, and what can be added.
|
||||
|
||||
---
|
||||
|
||||
## Feature Comparison Matrix
|
||||
|
||||
| Feature | Aetheel | Nanobot | NanoClaw | OpenClaw | PicoClaw |
|
||||
|---------|---------|---------|----------|----------|----------|
|
||||
| **Language** | Python | Python | TypeScript | TypeScript | Go |
|
||||
| **Channels** | Slack only | 9 channels | WhatsApp only | 15+ channels | 5 channels |
|
||||
| **LLM Runtime** | OpenCode / Claude Code (subprocess) | LiteLLM (multi-provider) | Claude Agent SDK | Pi Agent (custom RPC) | Go-native agent |
|
||||
| **Memory** | Hybrid (vector + BM25) | Simple file-based | Per-group CLAUDE.md | Workspace files | MEMORY.md + sessions |
|
||||
| **Config** | `.env` file | `config.json` | Code changes (no config) | JSON5 config | `config.json` |
|
||||
| **Skills** | ❌ None | ✅ Bundled + custom | ✅ Code skills (transform) | ✅ Bundled + managed + workspace | ✅ Custom skills |
|
||||
| **Scheduled Tasks** | ⚠️ Action tags (remind only) | ✅ Full cron system | ✅ Task scheduler | ✅ Cron + webhooks + Gmail | ✅ Cron + heartbeat |
|
||||
| **Security** | ❌ No sandbox | ⚠️ Workspace restriction | ✅ Container isolation | ✅ Docker sandbox + pairing | ✅ Workspace sandbox |
|
||||
| **MCP Support** | ❌ No | ✅ Yes | ❌ No | ❌ No | ❌ No |
|
||||
| **Web Search** | ❌ No | ✅ Brave Search | ✅ Via Claude tools | ✅ Browser control | ✅ Brave + DuckDuckGo |
|
||||
| **Voice** | ❌ No | ✅ Via Groq Whisper | ❌ No | ✅ Voice Wake + Talk Mode | ✅ Via Groq Whisper |
|
||||
| **Browser Control** | ❌ No | ❌ No | ❌ No | ✅ Full CDP control | ❌ No |
|
||||
| **Companion Apps** | ❌ No | ❌ No | ❌ No | ✅ macOS + iOS + Android | ❌ No |
|
||||
| **Session Management** | ✅ Thread-based (Slack) | ✅ Session-based | ✅ Per-group isolated | ✅ Full sessions + agent-to-agent | ✅ Session-based |
|
||||
| **Docker Support** | ❌ No | ✅ Yes | ❌ (uses Apple Container) | ✅ Full compose setup | ✅ Yes |
|
||||
| **Install Script** | ✅ Yes | ✅ pip/uv install | ✅ Claude guides setup | ✅ npm + wizard | ✅ Binary / make |
|
||||
| **Identity Files** | ✅ SOUL.md, USER.md, MEMORY.md | ✅ AGENTS.md, SOUL.md, USER.md, etc. | ✅ CLAUDE.md per group | ✅ AGENTS.md, SOUL.md, USER.md, TOOLS.md | ✅ Full set (AGENTS, SOUL, IDENTITY, USER, TOOLS) |
|
||||
| **Subagents** | ❌ No | ✅ Spawn subagent | ✅ Agent Swarms | ✅ sessions_send / sessions_spawn | ✅ Spawn subagent |
|
||||
| **Heartbeat/Proactive** | ❌ No | ✅ Heartbeat | ❌ No | ✅ Cron + wakeups | ✅ HEARTBEAT.md |
|
||||
| **Multi-provider** | ⚠️ Via OpenCode/Claude | ✅ 12+ providers | ❌ Claude only | ✅ Multi-model + failover | ✅ 7+ providers |
|
||||
| **WebChat** | ❌ No | ❌ No | ❌ No | ✅ Built-in WebChat | ❌ No |
|
||||
|
||||
---
|
||||
|
||||
## What Aetheel Does Well
|
||||
|
||||
### ✅ Strengths
|
||||
|
||||
1. **Advanced Memory System** — Aetheel has the most sophisticated memory system with **hybrid search (0.7 vector + 0.3 BM25)**, local embeddings via `fastembed`, and SQLite FTS5. None of the other repos have this level of memory sophistication.
|
||||
|
||||
2. **Local-First Embeddings** — Zero API calls for memory search. Uses ONNX-based local model (BAAI/bge-small-en-v1.5).
|
||||
|
||||
3. **Dual Runtime Support** — Clean abstraction allowing switching between OpenCode and Claude Code with the same `AgentResponse` interface.
|
||||
|
||||
4. **Thread Isolation in Slack** — Each Slack thread gets its own AI session, providing natural conversation isolation.
|
||||
|
||||
5. **Action Tags** — Inline `[ACTION:remind|minutes|message]` tags are elegant for in-response scheduling.
|
||||
|
||||
6. **File Watching** — Memory auto-reindexes when `.md` files are edited.
|
||||
|
||||
---
|
||||
|
||||
## What Aetheel Is Missing
|
||||
|
||||
### 🔴 Critical Gaps (High Priority)
|
||||
|
||||
#### 1. Multi-Channel Support
|
||||
**Current:** Slack only
|
||||
**All others:** Multiple channels (3-15+)
|
||||
|
||||
Aetheel is locked to Slack. Adding at least **Telegram** and **Discord** would significantly increase usability. All four inspiration repos treat multi-channel as essential.
|
||||
|
||||
> **Recommendation:** Follow Nanobot's pattern — each channel is a module in `channels/` with a common interface. Start with Telegram (easiest — just a token).
|
||||
|
||||
#### 2. Skills System
|
||||
**Current:** None
|
||||
**Others:** All have skills/plugins
|
||||
|
||||
Aetheel has no way to extend agent capabilities beyond its hardcoded memory and runtime setup. A skills system would allow:
|
||||
- Bundled skills (GitHub, weather, web search)
|
||||
- User-created skills in workspace
|
||||
- Community-contributed skills
|
||||
|
||||
> **Recommendation:** Create a `skills/` directory in the workspace. Skills are markdown files (`SKILL.md`) that get injected into the agent's context.
|
||||
|
||||
#### 3. Scheduled Tasks (Cron)
|
||||
**Current:** Only `[ACTION:remind]` (one-time, simple)
|
||||
**Others:** Full cron systems with persistent storage
|
||||
|
||||
The action tag system is clever but limited. A proper cron system would support:
|
||||
- Recurring cron expressions (`0 9 * * *`)
|
||||
- Interval-based scheduling
|
||||
- Persistent job storage
|
||||
- CLI management
|
||||
|
||||
> **Recommendation:** Add a `cron/` module with SQLite-backed job storage and an APScheduler-based execution engine.
|
||||
|
||||
#### 4. Security Sandbox
|
||||
**Current:** No sandboxing
|
||||
**Others:** Container isolation (NanoClaw), workspace restriction (PicoClaw), Docker sandbox (OpenClaw)
|
||||
|
||||
The AI runtime has unrestricted system access. At minimum, workspace-level restrictions should be added.
|
||||
|
||||
> **Recommendation:** Follow PicoClaw's approach — restrict tool access to workspace directory by default. Block dangerous shell commands.
|
||||
|
||||
---
|
||||
|
||||
### 🟡 Important Gaps (Medium Priority)
|
||||
|
||||
#### 5. Config File System (JSON instead of .env)
|
||||
**Current:** `.env` file with environment variables
|
||||
**Others:** JSON/JSON5 config files
|
||||
|
||||
A structured config file is more flexible and easier to manage than flat env vars. It can hold nested structures for channels, providers, tools, etc.
|
||||
|
||||
> **Recommendation:** Switch to `~/.aetheel/config.json` with a schema validator. Keep `.env` for secrets only.
|
||||
|
||||
#### 6. Web Search Tool
|
||||
**Current:** No web search
|
||||
**Others:** Brave Search, DuckDuckGo, or full browser control
|
||||
|
||||
The agent can't search the web. This is a significant limitation for a personal assistant.
|
||||
|
||||
> **Recommendation:** Add Brave Search API integration (free tier: 2000 queries/month) with DuckDuckGo as fallback.
|
||||
|
||||
#### 7. Subagent / Spawn Capability
|
||||
**Current:** No subagents
|
||||
**Others:** All have spawn/subagent systems
|
||||
|
||||
For long-running tasks, the main agent should be able to spawn background sub-tasks that work independently and report back.
|
||||
|
||||
> **Recommendation:** Add a `spawn` tool that creates a background thread/process running a separate agent session.
|
||||
|
||||
#### 8. Heartbeat / Proactive System
|
||||
**Current:** No proactive capabilities
|
||||
**Others:** Nanobot and PicoClaw have heartbeat systems
|
||||
|
||||
The agent only responds to messages. A heartbeat system would allow periodic check-ins, proactive notifications, and scheduled intelligence.
|
||||
|
||||
> **Recommendation:** Add `HEARTBEAT.md` file + periodic timer that triggers agent with heartbeat tasks.
|
||||
|
||||
#### 9. CLI Interface
|
||||
**Current:** Only `python main.py` with flags
|
||||
**Others:** Full CLI with subcommands (`nanobot agent`, `picoclaw cron`, etc.)
|
||||
|
||||
> **Recommendation:** Add a CLI using `click` or `argparse` with subcommands: `aetheel chat`, `aetheel status`, `aetheel cron`, etc.
|
||||
|
||||
#### 10. Tool System
|
||||
**Current:** No explicit tool system (AI handles everything via runtime)
|
||||
**Others:** Shell exec, file R/W, web search, spawn, message, etc.
|
||||
|
||||
Aetheel delegates all tool use to the AI runtime (OpenCode/Claude Code). While this works, having explicit tools gives more control and allows sandboxing.
|
||||
|
||||
> **Recommendation:** Define a tool interface and implement core tools (file ops, shell, web search) that run through the aetheel process with sandboxing.
|
||||
|
||||
---
|
||||
|
||||
### 🟢 Nice-to-Have (Lower Priority)
|
||||
|
||||
#### 11. MCP Server Support
|
||||
Only Nanobot supports MCP. Would allow connecting external tool servers.
|
||||
|
||||
#### 12. Multi-Provider Support
|
||||
Currently relies on OpenCode/Claude Code for provider handling. Direct multi-provider support (like Nanobot's 12+ providers via LiteLLM) would add flexibility.
|
||||
|
||||
#### 13. Docker / Container Support
|
||||
No Docker compose or containerized deployment option.
|
||||
|
||||
#### 14. Agent-to-Agent Communication
|
||||
OpenClaw's `sessions_send` allows agents to message each other. Useful for multi-agent workflows.
|
||||
|
||||
#### 15. Gateway Architecture
|
||||
Moving from a direct Slack adapter to a gateway pattern would make adding channels much easier.
|
||||
|
||||
#### 16. Onboarding Wizard
|
||||
OpenClaw's `onboard --install-daemon` provides a guided setup. Aetheel's install script is good but could be more interactive.
|
||||
|
||||
#### 17. Voice Support
|
||||
Voice Wake / Talk Mode (OpenClaw) or Whisper transcription (Nanobot, PicoClaw).
|
||||
|
||||
#### 18. WebChat Interface
|
||||
A browser-based chat UI connected to the gateway.
|
||||
|
||||
#### 19. TOOLS.md File
|
||||
A `TOOLS.md` file describing available tools to the agent, used by PicoClaw and OpenClaw.
|
||||
|
||||
#### 20. Self-Modification
|
||||
From `additions.txt`: "edit its own files and config as well as add skills" — the agent should be able to modify its own configuration and add new skills.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Comparison
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Aetheel["⚔️ Aetheel (Current)"]
|
||||
A_SLACK["Slack\n(only channel)"]
|
||||
A_MAIN["main.py"]
|
||||
A_MEM["Memory\n(hybrid search)"]
|
||||
A_RT["OpenCode / Claude\n(subprocess)"]
|
||||
end
|
||||
|
||||
subgraph Target["🎯 Target Architecture"]
|
||||
T_CHAN["Multi-Channel\nGateway"]
|
||||
T_CORE["Core Agent\n+ Tool System"]
|
||||
T_MEM["Memory\n(hybrid search)"]
|
||||
T_SK["Skills"]
|
||||
T_CRON["Cron"]
|
||||
T_PROV["Multi-Provider"]
|
||||
T_SEC["Security\nSandbox"]
|
||||
end
|
||||
|
||||
A_SLACK --> A_MAIN
|
||||
A_MAIN --> A_MEM
|
||||
A_MAIN --> A_RT
|
||||
|
||||
T_CHAN --> T_CORE
|
||||
T_CORE --> T_MEM
|
||||
T_CORE --> T_SK
|
||||
T_CORE --> T_CRON
|
||||
T_CORE --> T_PROV
|
||||
T_CORE --> T_SEC
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prioritized Roadmap Suggestion
|
||||
|
||||
Based on the analysis, here's a suggested implementation order:
|
||||
|
||||
### Phase 1: Foundation (Essentials)
|
||||
1. **Config system** — Switch from `.env` to JSON config
|
||||
2. **Skills system** — `skills/` directory with `SKILL.md` loading
|
||||
3. **Tool system** — Core tools (shell, file, web search) with sandbox
|
||||
4. **Security sandbox** — Workspace-restricted tool execution
|
||||
|
||||
### Phase 2: Channels & Scheduling
|
||||
5. **Channel abstraction** — Extract adapter interface from Slack adapter
|
||||
6. **Telegram channel** — First new channel
|
||||
7. **Cron system** — Full scheduled task management
|
||||
8. **CLI** — Proper CLI with subcommands
|
||||
|
||||
### Phase 3: Advanced Features
|
||||
9. **Heartbeat** — Proactive agent capabilities
|
||||
10. **Subagents** — Spawn background tasks
|
||||
11. **Discord channel** — Second new channel
|
||||
12. **Web search** — Brave Search + DuckDuckGo
|
||||
|
||||
### Phase 4: Polish
|
||||
13. **Self-modification** — Agent can edit config and add skills
|
||||
14. **Docker support** — Dockerfile + compose
|
||||
15. **MCP support** — External tool servers
|
||||
16. **WebChat** — Browser-based chat UI
|
||||
207
docs/nanobot.md
Normal file
207
docs/nanobot.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# 🐈 Nanobot — Architecture & How It Works
|
||||
|
||||
> **Ultra-Lightweight Personal AI Assistant** — ~4,000 lines of Python, 99% smaller than OpenClaw.
|
||||
|
||||
## Overview
|
||||
|
||||
Nanobot is a minimalist personal AI assistant written in Python that focuses on delivering core agent functionality with the smallest possible codebase. It uses LiteLLM for multi-provider LLM routing, supports 9+ chat channels, and includes memory, skills, scheduled tasks, and MCP tool integration.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | Python 3.11+ |
|
||||
| **Lines of Code** | ~4,000 (core agent) |
|
||||
| **Config** | `~/.nanobot/config.json` |
|
||||
| **Package** | `pip install nanobot-ai` |
|
||||
| **LLM Routing** | LiteLLM (multi-provider) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Chat Channels"]
|
||||
TG["Telegram"]
|
||||
DC["Discord"]
|
||||
WA["WhatsApp"]
|
||||
FS["Feishu"]
|
||||
MC["Mochat"]
|
||||
DT["DingTalk"]
|
||||
SL["Slack"]
|
||||
EM["Email"]
|
||||
QQ["QQ"]
|
||||
end
|
||||
|
||||
subgraph Gateway["🌐 Gateway (nanobot gateway)"]
|
||||
CH["Channel Manager"]
|
||||
MQ["Message Queue"]
|
||||
end
|
||||
|
||||
subgraph Agent["🧠 Core Agent"]
|
||||
LOOP["Agent Loop\n(loop.py)"]
|
||||
CTX["Context Builder\n(context.py)"]
|
||||
MEM["Memory System\n(memory.py)"]
|
||||
SK["Skills Loader\n(skills.py)"]
|
||||
SA["Subagent\n(subagent.py)"]
|
||||
end
|
||||
|
||||
subgraph Tools["🔧 Built-in Tools"]
|
||||
SHELL["Shell Exec"]
|
||||
FILE["File R/W/Edit"]
|
||||
WEB["Web Search"]
|
||||
SPAWN["Spawn Subagent"]
|
||||
MCP["MCP Servers"]
|
||||
end
|
||||
|
||||
subgraph Providers["☁️ LLM Providers (LiteLLM)"]
|
||||
OR["OpenRouter"]
|
||||
AN["Anthropic"]
|
||||
OA["OpenAI"]
|
||||
DS["DeepSeek"]
|
||||
GR["Groq"]
|
||||
GE["Gemini"]
|
||||
VL["vLLM (local)"]
|
||||
end
|
||||
|
||||
Channels --> Gateway
|
||||
Gateway --> Agent
|
||||
CTX --> LOOP
|
||||
MEM --> CTX
|
||||
SK --> CTX
|
||||
LOOP --> Tools
|
||||
LOOP --> Providers
|
||||
SA --> LOOP
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Chat Channel
|
||||
participant GW as Gateway
|
||||
participant Agent as Agent Loop
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward message
|
||||
GW->>Agent: Route to agent
|
||||
Agent->>Agent: Build context (memory, skills, identity)
|
||||
Agent->>LLM: Send prompt + tools
|
||||
LLM-->>Agent: Response (text or tool call)
|
||||
|
||||
alt Tool Call
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Tool result
|
||||
Agent->>LLM: Send tool result
|
||||
LLM-->>Agent: Final response
|
||||
end
|
||||
|
||||
Agent->>Agent: Update memory
|
||||
Agent-->>GW: Return response
|
||||
GW-->>Channel: Send reply
|
||||
Channel-->>User: Display response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Agent Loop (`agent/loop.py`)
|
||||
The core loop that manages the LLM ↔ tool execution cycle:
|
||||
- Builds a prompt using context (memory, skills, identity files)
|
||||
- Sends to LLM via LiteLLM
|
||||
- If LLM returns a tool call → executes it → sends result back
|
||||
- Continues until LLM returns a text response (no more tool calls)
|
||||
|
||||
### 2. Context Builder (`agent/context.py`)
|
||||
Assembles the system prompt from:
|
||||
- **Identity files**: `AGENTS.md`, `SOUL.md`, `USER.md`, `TOOLS.md`, `IDENTITY.md`
|
||||
- **Memory**: Persistent `MEMORY.md` with recall
|
||||
- **Skills**: Loaded from `~/.nanobot/workspace/skills/`
|
||||
- **Conversation history**: Session-based context
|
||||
|
||||
### 3. Memory System (`agent/memory.py`)
|
||||
- Persistent memory stored in `MEMORY.md` in the workspace
|
||||
- Agent can read and write memories
|
||||
- Survives across sessions
|
||||
|
||||
### 4. Provider Registry (`providers/registry.py`)
|
||||
- Single-source-of-truth for all LLM providers
|
||||
- Adding a new provider = 2 steps (add `ProviderSpec` + config field)
|
||||
- Auto-prefixes model names for LiteLLM routing
|
||||
- Supports 12+ providers including local vLLM
|
||||
|
||||
### 5. Channel System (`channels/`)
|
||||
- 9 chat platforms supported (Telegram, Discord, WhatsApp, Feishu, Mochat, DingTalk, Slack, Email, QQ)
|
||||
- Each channel handles auth, message parsing, and response delivery
|
||||
- Allowlist-based security (`allowFrom`)
|
||||
- Started via `nanobot gateway`
|
||||
|
||||
### 6. Skills (`skills/`)
|
||||
- Bundled skills: GitHub, weather, tmux, etc.
|
||||
- Custom skills loaded from workspace
|
||||
- Skills are injected into the agent's context
|
||||
|
||||
### 7. Scheduled Tasks (Cron)
|
||||
- Add jobs via `nanobot cron add`
|
||||
- Supports cron expressions and interval-based scheduling
|
||||
- Jobs stored persistently
|
||||
|
||||
### 8. MCP Integration
|
||||
- Supports Model Context Protocol servers
|
||||
- Stdio and HTTP transport modes
|
||||
- Compatible with Claude Desktop / Cursor MCP configs
|
||||
- Tools auto-discovered and registered at startup
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
nanobot/
|
||||
├── agent/ # 🧠 Core agent logic
|
||||
│ ├── loop.py # Agent loop (LLM ↔ tool execution)
|
||||
│ ├── context.py # Prompt builder
|
||||
│ ├── memory.py # Persistent memory
|
||||
│ ├── skills.py # Skills loader
|
||||
│ ├── subagent.py # Background task execution
|
||||
│ └── tools/ # Built-in tools (incl. spawn)
|
||||
├── skills/ # 🎯 Bundled skills (github, weather, tmux...)
|
||||
├── channels/ # 📱 Chat channel integrations
|
||||
├── providers/ # ☁️ LLM provider registry
|
||||
├── config/ # ⚙️ Configuration schema
|
||||
├── cron/ # ⏰ Scheduled tasks
|
||||
├── heartbeat/ # 💓 Heartbeat system
|
||||
├── session/ # 📝 Session management
|
||||
├── bus/ # 📨 Internal event bus
|
||||
├── cli/ # 🖥️ CLI commands
|
||||
└── utils/ # 🔧 Utilities
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `nanobot onboard` | Initialize config & workspace |
|
||||
| `nanobot agent -m "..."` | Chat with the agent |
|
||||
| `nanobot agent` | Interactive chat mode |
|
||||
| `nanobot gateway` | Start all channels |
|
||||
| `nanobot status` | Show status |
|
||||
| `nanobot cron add/list/remove` | Manage scheduled tasks |
|
||||
| `nanobot channels login` | Link WhatsApp device |
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **LiteLLM for provider abstraction** — One interface for all LLM providers
|
||||
2. **JSON config over env vars** — Single `config.json` file for all settings
|
||||
3. **Skills-based extensibility** — Modular skill system for adding capabilities
|
||||
4. **Provider Registry pattern** — Adding providers is 2-step, zero if-elif chains
|
||||
5. **Agent social network** — Can join MoltBook, ClawdChat communities
|
||||
214
docs/nanoclaw.md
Normal file
214
docs/nanoclaw.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# 🦀 NanoClaw — Architecture & How It Works
|
||||
|
||||
> **Minimal, Security-First Personal AI Assistant** — built on Claude Agent SDK with container isolation.
|
||||
|
||||
## Overview
|
||||
|
||||
NanoClaw is a minimalist personal AI assistant that prioritizes **security through container isolation** and **understandability through small codebase size**. It runs on Claude Agent SDK (Claude Code) and uses WhatsApp as its primary channel. Each group chat runs in its own isolated Linux container.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | TypeScript (Node.js 20+) |
|
||||
| **Codebase Size** | ~34.9k tokens (~17% of Claude context window) |
|
||||
| **Config** | No config files — code changes only |
|
||||
| **AI Runtime** | Claude Agent SDK (Claude Code) |
|
||||
| **Primary Channel** | WhatsApp (Baileys) |
|
||||
| **Isolation** | Apple Container (macOS) / Docker (Linux) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph WhatsApp["📱 WhatsApp"]
|
||||
WA["WhatsApp Client\n(Baileys)"]
|
||||
end
|
||||
|
||||
subgraph Core["🧠 Core Process (Single Node.js)"]
|
||||
IDX["Orchestrator\n(index.ts)"]
|
||||
DB["SQLite DB\n(db.ts)"]
|
||||
GQ["Group Queue\n(group-queue.ts)"]
|
||||
TS["Task Scheduler\n(task-scheduler.ts)"]
|
||||
IPC["IPC Watcher\n(ipc.ts)"]
|
||||
RT["Router\n(router.ts)"]
|
||||
end
|
||||
|
||||
subgraph Containers["🐳 Isolated Containers"]
|
||||
C1["Container 1\nGroup A\n(CLAUDE.md)"]
|
||||
C2["Container 2\nGroup B\n(CLAUDE.md)"]
|
||||
C3["Container 3\nMain Channel\n(CLAUDE.md)"]
|
||||
end
|
||||
|
||||
subgraph Memory["💾 Per-Group Memory"]
|
||||
M1["groups/A/CLAUDE.md"]
|
||||
M2["groups/B/CLAUDE.md"]
|
||||
M3["groups/main/CLAUDE.md"]
|
||||
end
|
||||
|
||||
WA --> IDX
|
||||
IDX --> DB
|
||||
IDX --> GQ
|
||||
GQ --> Containers
|
||||
TS --> Containers
|
||||
Containers --> IPC
|
||||
IPC --> RT
|
||||
RT --> WA
|
||||
C1 --- M1
|
||||
C2 --- M2
|
||||
C3 --- M3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant WA as WhatsApp (Baileys)
|
||||
participant IDX as Orchestrator
|
||||
participant DB as SQLite
|
||||
participant GQ as Group Queue
|
||||
participant Container as Container (Claude SDK)
|
||||
participant IPC as IPC Watcher
|
||||
|
||||
User->>WA: Send message with @Andy
|
||||
WA->>IDX: New message event
|
||||
IDX->>DB: Store message
|
||||
IDX->>GQ: Enqueue (per-group, concurrency limited)
|
||||
GQ->>Container: Spawn Claude agent container
|
||||
Note over Container: Mounts only group's filesystem
|
||||
Note over Container: Reads group-specific CLAUDE.md
|
||||
Container->>Container: Claude processes with tools
|
||||
Container->>IPC: Write response to filesystem
|
||||
IPC->>IDX: Detect new response file
|
||||
IDX->>WA: Send reply
|
||||
WA->>User: Display response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Orchestrator (`src/index.ts`)
|
||||
The single entry point that manages:
|
||||
- WhatsApp connection state
|
||||
- Message polling loop
|
||||
- Agent invocation decisions
|
||||
- State management for groups and sessions
|
||||
|
||||
### 2. WhatsApp Channel (`src/channels/whatsapp.ts`)
|
||||
- Uses **Baileys** library for WhatsApp Web connection
|
||||
- Handles authentication via QR code scan
|
||||
- Manages send/receive of messages
|
||||
- Supports media messages
|
||||
|
||||
### 3. Container Runner (`src/container-runner.ts`)
|
||||
The security core of NanoClaw:
|
||||
- Spawns **streaming Claude Agent SDK** containers
|
||||
- Each group runs in its own Linux container
|
||||
- **Apple Container** on macOS, **Docker** on Linux
|
||||
- Only explicitly mounted directories are accessible
|
||||
- Bash commands run INSIDE the container, not on host
|
||||
|
||||
### 4. SQLite Database (`src/db.ts`)
|
||||
- Stores messages, groups, sessions, and state
|
||||
- Per-group message history
|
||||
- Session continuity tracking
|
||||
|
||||
### 5. Group Queue (`src/group-queue.ts`)
|
||||
- Per-group message queue
|
||||
- Global concurrency limit
|
||||
- Ensures one agent invocation per group at a time
|
||||
|
||||
### 6. IPC System (`src/ipc.ts`)
|
||||
- Filesystem-based inter-process communication
|
||||
- Container writes response to mounted directory
|
||||
- IPC watcher detects and processes response files
|
||||
- Handles task results from scheduled jobs
|
||||
|
||||
### 7. Task Scheduler (`src/task-scheduler.ts`)
|
||||
- Recurring jobs that run Claude in containers
|
||||
- Jobs can message the user back
|
||||
- Managed from the main channel (self-chat)
|
||||
|
||||
### 8. Router (`src/router.ts`)
|
||||
- Formats outbound messages
|
||||
- Routes responses to correct WhatsApp recipient
|
||||
|
||||
### 9. Per-Group Memory (`groups/*/CLAUDE.md`)
|
||||
- Each group has its own `CLAUDE.md` memory file
|
||||
- Mounted into the group's container
|
||||
- Complete filesystem isolation between groups
|
||||
|
||||
---
|
||||
|
||||
## Security Model
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph Host["🖥️ Host System"]
|
||||
NanoClaw["NanoClaw Process"]
|
||||
end
|
||||
|
||||
subgraph Container1["🐳 Container (Group A)"]
|
||||
Agent1["Claude Agent"]
|
||||
FS1["Mounted: groups/A/"]
|
||||
end
|
||||
|
||||
subgraph Container2["🐳 Container (Group B)"]
|
||||
Agent2["Claude Agent"]
|
||||
FS2["Mounted: groups/B/"]
|
||||
end
|
||||
|
||||
NanoClaw -->|"Spawns"| Container1
|
||||
NanoClaw -->|"Spawns"| Container2
|
||||
|
||||
style Container1 fill:#e8f5e9
|
||||
style Container2 fill:#e8f5e9
|
||||
```
|
||||
|
||||
- **OS-level isolation** vs. application-level permission checks
|
||||
- Agents can only see what's explicitly mounted
|
||||
- Bash commands run in container, not on host
|
||||
- No shared memory between groups
|
||||
|
||||
---
|
||||
|
||||
## Philosophy & Design Decisions
|
||||
|
||||
1. **Small enough to understand** — Read the entire codebase in ~8 minutes
|
||||
2. **Secure by isolation** — Linux containers, not permission checks
|
||||
3. **Built for one user** — Not a framework, working software for personal use
|
||||
4. **Customization = code changes** — No config sprawl, modify the code directly
|
||||
5. **AI-native** — Claude Code handles setup (`/setup`), debugging, customization
|
||||
6. **Skills over features** — Don't add features to codebase, add skills that transform forks
|
||||
7. **Best harness, best model** — Claude Agent SDK gives Claude Code superpowers
|
||||
|
||||
---
|
||||
|
||||
## Agent Swarms (Unique Feature)
|
||||
|
||||
NanoClaw is the **first personal AI assistant** to support Agent Swarms:
|
||||
- Spin up teams of specialized agents
|
||||
- Agents collaborate within your chat
|
||||
- Each agent runs in its own container
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Setup (Claude Code handles everything)
|
||||
git clone https://github.com/gavrielc/nanoclaw.git
|
||||
cd nanoclaw
|
||||
claude
|
||||
# Then run /setup
|
||||
|
||||
# Talk to your assistant
|
||||
@Andy send me a daily summary every morning at 9am
|
||||
@Andy review the git history and update the README
|
||||
```
|
||||
|
||||
Trigger word: `@Andy` (customizable via code changes)
|
||||
291
docs/openclaw.md
Normal file
291
docs/openclaw.md
Normal file
@@ -0,0 +1,291 @@
|
||||
# 🦞 OpenClaw — Architecture & How It Works
|
||||
|
||||
> **Full-Featured Personal AI Assistant** — Massive TypeScript codebase with 15+ channels, companion apps, and enterprise-grade features.
|
||||
|
||||
## Overview
|
||||
|
||||
OpenClaw is the most feature-complete personal AI assistant in this space. It's a TypeScript monorepo with a WebSocket-based Gateway as the control plane, supporting 15+ messaging channels, companion macOS/iOS/Android apps, browser control, live canvas, voice wake, and extensive automation.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | TypeScript (Node.js ≥22) |
|
||||
| **Codebase Size** | 430k+ lines, 50+ source modules |
|
||||
| **Config** | `~/.openclaw/openclaw.json` (JSON5) |
|
||||
| **AI Runtime** | Pi Agent (custom RPC), multi-model |
|
||||
| **Channels** | 15+ (WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, Zalo, WebChat, etc.) |
|
||||
| **Package Mgr** | pnpm (monorepo) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Messaging Channels (15+)"]
|
||||
WA["WhatsApp\n(Baileys)"]
|
||||
TG["Telegram\n(grammY)"]
|
||||
SL["Slack\n(Bolt)"]
|
||||
DC["Discord\n(discord.js)"]
|
||||
GC["Google Chat"]
|
||||
SIG["Signal\n(signal-cli)"]
|
||||
BB["BlueBubbles\n(iMessage)"]
|
||||
IM["iMessage\n(legacy)"]
|
||||
MST["MS Teams"]
|
||||
MTX["Matrix"]
|
||||
ZL["Zalo"]
|
||||
WC["WebChat"]
|
||||
end
|
||||
|
||||
subgraph Gateway["🌐 Gateway (Control Plane)"]
|
||||
WS["WebSocket Server\nws://127.0.0.1:18789"]
|
||||
SES["Session Manager"]
|
||||
RTE["Channel Router"]
|
||||
PRES["Presence System"]
|
||||
Q["Message Queue"]
|
||||
CFG["Config Manager"]
|
||||
AUTH["Auth / Pairing"]
|
||||
end
|
||||
|
||||
subgraph Agent["🧠 Pi Agent (RPC)"]
|
||||
AGENT["Agent Runtime"]
|
||||
TOOLS["Tool Registry"]
|
||||
STREAM["Block Streaming"]
|
||||
PROV["Provider Router\n(multi-model)"]
|
||||
end
|
||||
|
||||
subgraph Apps["📲 Companion Apps"]
|
||||
MAC["macOS Menu Bar"]
|
||||
IOS["iOS Node"]
|
||||
ANDR["Android Node"]
|
||||
end
|
||||
|
||||
subgraph ToolSet["🔧 Tools & Automation"]
|
||||
BROWSER["Browser Control\n(CDP/Chromium)"]
|
||||
CANVAS["Live Canvas\n(A2UI)"]
|
||||
CRON["Cron Jobs"]
|
||||
WEBHOOK["Webhooks"]
|
||||
GMAIL["Gmail Pub/Sub"]
|
||||
NODES["Nodes\n(camera, screen, location)"]
|
||||
SKILLS_T["Skills Platform"]
|
||||
SESS_T["Session Tools\n(agent-to-agent)"]
|
||||
end
|
||||
|
||||
subgraph Workspace["💾 Workspace"]
|
||||
AGENTS_MD["AGENTS.md"]
|
||||
SOUL_MD["SOUL.md"]
|
||||
USER_MD["USER.md"]
|
||||
TOOLS_MD["TOOLS.md"]
|
||||
SKILLS_W["Skills/"]
|
||||
end
|
||||
|
||||
Channels --> Gateway
|
||||
Apps --> Gateway
|
||||
Gateway --> Agent
|
||||
Agent --> ToolSet
|
||||
Agent --> Workspace
|
||||
Agent --> PROV
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Channel (WA/TG/Slack/etc.)
|
||||
participant GW as Gateway (WS)
|
||||
participant Session as Session Manager
|
||||
participant Agent as Pi Agent (RPC)
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward via channel adapter
|
||||
GW->>Session: Route to session (main/group)
|
||||
GW->>GW: Check auth (pairing/allowlist)
|
||||
Session->>Agent: Invoke agent (RPC)
|
||||
Agent->>Agent: Build prompt (AGENTS.md, SOUL.md, tools)
|
||||
Agent->>LLM: Stream request (with tool definitions)
|
||||
|
||||
loop Tool Use Loop
|
||||
LLM-->>Agent: Tool call (block stream)
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Tool result
|
||||
Agent->>LLM: Continue with result
|
||||
end
|
||||
|
||||
LLM-->>Agent: Final response (block stream)
|
||||
Agent-->>Session: Return response
|
||||
Session->>GW: Add to outbound queue
|
||||
GW->>GW: Chunk if needed (per-channel limits)
|
||||
GW->>Channel: Send chunked replies
|
||||
Channel->>User: Display response
|
||||
|
||||
Note over GW: Typing indicators, presence updates
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Gateway (`src/gateway/`)
|
||||
The central control plane — everything connects through it:
|
||||
- **WebSocket server** on `ws://127.0.0.1:18789`
|
||||
- Session management (main, group, per-channel)
|
||||
- Multi-agent routing (different agents for different channels)
|
||||
- Presence tracking and typing indicators
|
||||
- Config management and hot-reload
|
||||
- Health checks, doctor diagnostics
|
||||
|
||||
### 2. Pi Agent (`src/agents/`)
|
||||
Custom RPC-based agent runtime:
|
||||
- Tool streaming and block streaming
|
||||
- Multi-model support with failover
|
||||
- Session pruning for long conversations
|
||||
- Usage tracking (tokens, cost)
|
||||
- Thinking level control (off → xhigh)
|
||||
|
||||
### 3. Channel System (`src/channels/` + per-channel dirs)
|
||||
15+ channel adapters, each with:
|
||||
- Auth handling (pairing codes, allowlists, OAuth)
|
||||
- Message format conversion
|
||||
- Media pipeline (images, audio, video)
|
||||
- Group routing with mention gating
|
||||
- Per-channel chunking (character limits differ)
|
||||
|
||||
### 4. Security System (`src/security/`)
|
||||
Multi-layered security:
|
||||
- **DM Pairing** — unknown senders get a pairing code, must be approved
|
||||
- **Allowlists** — per-channel user whitelists
|
||||
- **Docker Sandbox** — non-main sessions can run in per-session Docker containers
|
||||
- **Tool denylist** — block dangerous tools in sandbox mode
|
||||
- **Elevated bash** — per-session toggle for host-level access
|
||||
|
||||
### 5. Browser Control (`src/browser/`)
|
||||
- Dedicated OpenClaw-managed Chrome/Chromium instance
|
||||
- CDP (Chrome DevTools Protocol) control
|
||||
- Snapshots, actions, uploads, profiles
|
||||
- Full web automation capabilities
|
||||
|
||||
### 6. Canvas & A2UI (`src/canvas-host/`)
|
||||
- Agent-driven visual workspace
|
||||
- A2UI (Agent-to-UI) — push HTML/JS to canvas
|
||||
- Canvas eval, snapshot, reset
|
||||
- Available on macOS, iOS, Android
|
||||
|
||||
### 7. Voice System
|
||||
- **Voice Wake** — always-on speech detection
|
||||
- **Talk Mode** — continuous conversation overlay
|
||||
- ElevenLabs TTS integration
|
||||
- Available on macOS, iOS, Android
|
||||
|
||||
### 8. Companion Apps
|
||||
- **macOS app**: Menu bar, Voice Wake/PTT, WebChat, debug tools
|
||||
- **iOS node**: Canvas, Voice Wake, Talk Mode, camera, Bonjour pairing
|
||||
- **Android node**: Canvas, Talk Mode, camera, screen recording, SMS
|
||||
|
||||
### 9. Session Tools (Agent-to-Agent)
|
||||
- `sessions_list` — discover active sessions
|
||||
- `sessions_history` — fetch transcript logs
|
||||
- `sessions_send` — message another session with reply-back
|
||||
|
||||
### 10. Skills Platform (`src/plugins/`, `skills/`)
|
||||
- **Bundled skills** — pre-installed capabilities
|
||||
- **Managed skills** — installed from ClawHub registry
|
||||
- **Workspace skills** — user-created in workspace
|
||||
- Install gating and UI
|
||||
- ClawHub registry for community skills
|
||||
|
||||
### 11. Automation
|
||||
- **Cron jobs** — scheduled recurring tasks
|
||||
- **Webhooks** — external trigger surface
|
||||
- **Gmail Pub/Sub** — email-triggered actions
|
||||
|
||||
### 12. Ops & Deployment
|
||||
- Docker support with compose
|
||||
- Tailscale Serve/Funnel for remote access
|
||||
- SSH tunnels with token/password auth
|
||||
- `openclaw doctor` for diagnostics
|
||||
- Nix mode for declarative config
|
||||
|
||||
---
|
||||
|
||||
## Project Structure (Simplified)
|
||||
|
||||
```
|
||||
openclaw/
|
||||
├── src/
|
||||
│ ├── agents/ # Pi agent runtime
|
||||
│ ├── gateway/ # WebSocket gateway
|
||||
│ ├── channels/ # Channel adapter base
|
||||
│ ├── whatsapp/ # WhatsApp adapter
|
||||
│ ├── telegram/ # Telegram adapter
|
||||
│ ├── slack/ # Slack adapter
|
||||
│ ├── discord/ # Discord adapter
|
||||
│ ├── signal/ # Signal adapter
|
||||
│ ├── imessage/ # iMessage adapters
|
||||
│ ├── browser/ # Browser control (CDP)
|
||||
│ ├── canvas-host/ # Canvas & A2UI
|
||||
│ ├── sessions/ # Session management
|
||||
│ ├── routing/ # Message routing
|
||||
│ ├── security/ # Auth, pairing, sandbox
|
||||
│ ├── cron/ # Scheduled jobs
|
||||
│ ├── memory/ # Memory system
|
||||
│ ├── providers/ # LLM providers
|
||||
│ ├── plugins/ # Plugin/skill system
|
||||
│ ├── media/ # Media pipeline
|
||||
│ ├── tts/ # Text-to-speech
|
||||
│ ├── web/ # Control UI + WebChat
|
||||
│ ├── wizard/ # Onboarding wizard
|
||||
│ └── cli/ # CLI commands
|
||||
├── apps/ # Companion app sources
|
||||
├── packages/ # Shared packages
|
||||
├── extensions/ # Extension channels
|
||||
├── skills/ # Bundled skills
|
||||
├── ui/ # Web UI source
|
||||
└── Swabble/ # macOS/iOS Swift source
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `openclaw onboard` | Guided setup wizard |
|
||||
| `openclaw gateway` | Start the gateway |
|
||||
| `openclaw agent --message "..."` | Chat with agent |
|
||||
| `openclaw message send` | Send to any channel |
|
||||
| `openclaw doctor` | Diagnostics & migration |
|
||||
| `openclaw pairing approve` | Approve DM pairing |
|
||||
| `openclaw update` | Update to latest version |
|
||||
| `openclaw channels login` | Link WhatsApp |
|
||||
|
||||
---
|
||||
|
||||
## Chat Commands (In-Channel)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/status` | Session status (model, tokens, cost) |
|
||||
| `/new` / `/reset` | Reset session |
|
||||
| `/compact` | Compact session context |
|
||||
| `/think <level>` | Set thinking level |
|
||||
| `/verbose on\|off` | Toggle verbose mode |
|
||||
| `/usage off\|tokens\|full` | Usage footer |
|
||||
| `/restart` | Restart gateway |
|
||||
| `/activation mention\|always` | Group activation mode |
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Gateway as control plane** — Single WebSocket server everything connects to
|
||||
2. **Multi-agent routing** — Different agents for different channels/groups
|
||||
3. **Pairing-based security** — Unknown DMs get pairing codes by default
|
||||
4. **Docker sandboxing** — Non-main sessions can be isolated
|
||||
5. **Block streaming** — Responses streamed as structured blocks
|
||||
6. **Extension-based channels** — MS Teams, Matrix, Zalo are extensions
|
||||
7. **Companion apps** — Native macOS/iOS/Android for device-level features
|
||||
8. **ClawHub** — Community skill registry
|
||||
251
docs/picoclaw.md
Normal file
251
docs/picoclaw.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# 🦐 PicoClaw — Architecture & How It Works
|
||||
|
||||
> **Ultra-Efficient AI Assistant in Go** — $10 hardware, 10MB RAM, 1s boot time.
|
||||
|
||||
## Overview
|
||||
|
||||
PicoClaw is an extreme-lightweight rewrite of Nanobot in Go, designed to run on the cheapest possible hardware — including $10 RISC-V SBCs with <10MB RAM. The entire project was AI-bootstrapped (95% agent-generated) through a self-bootstrapping migration from Python to Go.
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Language** | Go 1.21+ |
|
||||
| **RAM Usage** | <10MB |
|
||||
| **Startup Time** | <1s (even at 0.6GHz) |
|
||||
| **Hardware Cost** | As low as $10 |
|
||||
| **Architectures** | x86_64, ARM64, RISC-V |
|
||||
| **Binary** | Single self-contained binary |
|
||||
| **Config** | `~/.picoclaw/config.json` |
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph Channels["📱 Chat Channels"]
|
||||
TG["Telegram"]
|
||||
DC["Discord"]
|
||||
QQ["QQ"]
|
||||
DT["DingTalk"]
|
||||
LINE["LINE"]
|
||||
end
|
||||
|
||||
subgraph Core["🧠 Core Agent (Single Binary)"]
|
||||
MAIN["Main Entry\n(cmd/)"]
|
||||
AGENT["Agent Loop\n(pkg/agent/)"]
|
||||
CONF["Config\n(pkg/config/)"]
|
||||
AUTH["Auth\n(pkg/auth/)"]
|
||||
PROV["Providers\n(pkg/providers/)"]
|
||||
TOOLS["Tools\n(pkg/tools/)"]
|
||||
end
|
||||
|
||||
subgraph ToolSet["🔧 Built-in Tools"]
|
||||
SHELL["Shell Exec"]
|
||||
FILE["File R/W"]
|
||||
WEB["Web Search\n(Brave / DuckDuckGo)"]
|
||||
CRON_T["Cron / Reminders"]
|
||||
SPAWN["Spawn Subagent"]
|
||||
MSG["Message Tool"]
|
||||
end
|
||||
|
||||
subgraph Workspace["💾 Workspace"]
|
||||
AGENTS_MD["AGENTS.md"]
|
||||
SOUL_MD["SOUL.md"]
|
||||
TOOLS_MD["TOOLS.md"]
|
||||
USER_MD["USER.md"]
|
||||
IDENTITY["IDENTITY.md"]
|
||||
HB["HEARTBEAT.md"]
|
||||
MEM["MEMORY.md"]
|
||||
SESSIONS["sessions/"]
|
||||
SKILLS["skills/"]
|
||||
end
|
||||
|
||||
subgraph Providers["☁️ LLM Providers"]
|
||||
GEMINI["Gemini"]
|
||||
ZHIPU["Zhipu"]
|
||||
OR["OpenRouter"]
|
||||
OA["OpenAI"]
|
||||
AN["Anthropic"]
|
||||
DS["DeepSeek"]
|
||||
GROQ["Groq\n(+ voice)"]
|
||||
end
|
||||
|
||||
Channels --> Core
|
||||
AGENT --> ToolSet
|
||||
AGENT --> Workspace
|
||||
AGENT --> Providers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Channel as Chat Channel
|
||||
participant GW as Gateway
|
||||
participant Agent as Agent Loop
|
||||
participant LLM as LLM Provider
|
||||
participant Tools as Tools
|
||||
|
||||
User->>Channel: Send message
|
||||
Channel->>GW: Forward message
|
||||
GW->>Agent: Route to agent
|
||||
Agent->>Agent: Load context (AGENTS.md, SOUL.md, USER.md)
|
||||
Agent->>LLM: Send prompt + tool defs
|
||||
LLM-->>Agent: Response
|
||||
|
||||
alt Tool Call
|
||||
Agent->>Tools: Execute tool
|
||||
Tools-->>Agent: Result
|
||||
Agent->>LLM: Continue
|
||||
LLM-->>Agent: Next response
|
||||
end
|
||||
|
||||
Agent->>Agent: Update memory/session
|
||||
Agent-->>GW: Return response
|
||||
GW-->>Channel: Send reply
|
||||
Channel-->>User: Display
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Heartbeat System Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Timer as Heartbeat Timer
|
||||
participant Agent as Agent
|
||||
participant HB as HEARTBEAT.md
|
||||
participant Subagent as Spawn Subagent
|
||||
participant User
|
||||
|
||||
Timer->>Agent: Trigger (every 30 min)
|
||||
Agent->>HB: Read periodic tasks
|
||||
|
||||
alt Quick Task
|
||||
Agent->>Agent: Execute directly
|
||||
Agent-->>Timer: HEARTBEAT_OK
|
||||
end
|
||||
|
||||
alt Long Task
|
||||
Agent->>Subagent: Spawn async subagent
|
||||
Agent-->>Timer: Continue to next task
|
||||
Subagent->>Subagent: Work independently
|
||||
Subagent->>User: Send result via message tool
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Agent Loop (`pkg/agent/`)
|
||||
Go-native implementation of the LLM ↔ tool execution loop:
|
||||
- Builds context from workspace identity files
|
||||
- Sends to LLM provider with tool definitions
|
||||
- Iterates on tool calls up to `max_tool_iterations` (default: 20)
|
||||
- Session history managed in `workspace/sessions/`
|
||||
|
||||
### 2. Provider System (`pkg/providers/`)
|
||||
- Gemini and Zhipu are fully tested
|
||||
- OpenRouter, Anthropic, OpenAI, DeepSeek marked "to be tested"
|
||||
- Groq for voice transcription (Whisper)
|
||||
- Each provider implements a common interface
|
||||
|
||||
### 3. Tool System (`pkg/tools/`)
|
||||
Built-in tools:
|
||||
- **read_file** / **write_file** / **list_dir** / **edit_file** / **append_file** — File operations
|
||||
- **exec** — Shell command execution (with safety guards)
|
||||
- **web_search** — Brave Search or DuckDuckGo fallback
|
||||
- **cron** — Scheduled reminders and recurring tasks
|
||||
- **spawn** — Create async subagents
|
||||
- **message** — Subagent-to-user communication
|
||||
|
||||
### 4. Security Sandbox
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
RW["restrict_to_workspace = true"]
|
||||
|
||||
RW --> RF["read_file: workspace only"]
|
||||
RW --> WF["write_file: workspace only"]
|
||||
RW --> LD["list_dir: workspace only"]
|
||||
RW --> EF["edit_file: workspace only"]
|
||||
RW --> AF["append_file: workspace only"]
|
||||
RW --> EX["exec: workspace paths only"]
|
||||
|
||||
EX --> BL["ALWAYS Blocked:"]
|
||||
BL --> RM["rm -rf"]
|
||||
BL --> FMT["format, mkfs"]
|
||||
BL --> DD["dd if="]
|
||||
BL --> SHUT["shutdown, reboot"]
|
||||
BL --> FORK["fork bomb"]
|
||||
```
|
||||
|
||||
- Workspace sandbox enabled by default
|
||||
- All tools restricted to workspace directory
|
||||
- Dangerous commands always blocked (even with sandbox off)
|
||||
- Consistent across main agent, subagents, and heartbeat tasks
|
||||
|
||||
### 5. Heartbeat System
|
||||
- Reads `HEARTBEAT.md` every 30 minutes
|
||||
- Quick tasks executed directly
|
||||
- Long tasks spawned as async subagents
|
||||
- Subagents communicate independently via message tool
|
||||
|
||||
### 6. Channel System
|
||||
- **Telegram** — Easy setup (token only)
|
||||
- **Discord** — Bot token + intents
|
||||
- **QQ** — AppID + AppSecret
|
||||
- **DingTalk** — Client credentials
|
||||
- **LINE** — Credentials + webhook URL (HTTPS required)
|
||||
|
||||
### 7. Workspace Layout
|
||||
```
|
||||
~/.picoclaw/workspace/
|
||||
├── sessions/ # Conversation history
|
||||
├── memory/ # Long-term memory (MEMORY.md)
|
||||
├── state/ # Persistent state
|
||||
├── cron/ # Scheduled jobs database
|
||||
├── skills/ # Custom skills
|
||||
├── AGENTS.md # Agent behavior guide
|
||||
├── HEARTBEAT.md # Periodic task prompts
|
||||
├── IDENTITY.md # Agent identity
|
||||
├── SOUL.md # Agent soul
|
||||
├── TOOLS.md # Tool descriptions
|
||||
└── USER.md # User preferences
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comparison Table (from README)
|
||||
|
||||
| | OpenClaw | NanoBot | **PicoClaw** |
|
||||
|---------------------|------------|-------------|-----------------------|
|
||||
| **Language** | TypeScript | Python | **Go** |
|
||||
| **RAM** | >1GB | >100MB | **<10MB** |
|
||||
| **Startup (0.8GHz)**| >500s | >30s | **<1s** |
|
||||
| **Cost** | Mac $599 | SBC ~$50 | **Any Linux, ~$10** |
|
||||
|
||||
---
|
||||
|
||||
## Deployment Targets
|
||||
|
||||
PicoClaw can run on almost any Linux device:
|
||||
- **$9.9** LicheeRV-Nano — Minimal home assistant
|
||||
- **$30-50** NanoKVM — Automated server maintenance
|
||||
- **$50-100** MaixCAM — Smart monitoring
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Go for minimal footprint** — Single binary, no runtime deps, tiny memory
|
||||
2. **AI-bootstrapped migration** — 95% of Go code generated by the AI agent itself
|
||||
3. **Web search with fallback** — Brave Search primary, DuckDuckGo fallback (free)
|
||||
4. **Heartbeat for proactive tasks** — Agent checks `HEARTBEAT.md` periodically
|
||||
5. **Subagent pattern** — Long tasks run async, don't block heartbeat
|
||||
6. **Default sandbox** — `restrict_to_workspace: true` by default
|
||||
7. **Cross-architecture** — Single binary compiles for x86, ARM64, RISC-V
|
||||
Reference in New Issue
Block a user