Adds Agent Swarms
* feat: streaming container mode, IPC messaging, agent teams support
Major architectural shift from single-shot container runs to long-lived
streaming containers with IPC-based message injection.
- Agent runner: query loop with AsyncIterable prompt to keep stdin open
for agent teams (fixes isSingleUserTurn premature shutdown)
- New standalone stdio MCP server (ipc-mcp-stdio.ts) inheritable by
subagents, with send_message and schedule_task tools
- Streaming output: parse OUTPUT_START/END markers in real-time, send
results to WhatsApp as they arrive
- IPC file-based messaging: host writes to ipc/{group}/input/, agent
polls for follow-up messages without respawning containers
- Per-group settings.json with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
- SDK bumped to 0.2.34 for TeamCreate tool support
- Container idle timeout (30min) with _close sentinel for shutdown
- Orphaned container cleanup on startup
- alwaysRespond flag for groups that skip trigger pattern check
- Uncaught exception/rejection handlers with timestamps in logger
- Combined SDK documentation into single deep dive reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: remove unused ipc-mcp.ts (replaced by ipc-mcp-stdio.ts)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clarify agent communication model in docs and tool descriptions
- CLAUDE.md (main + global): split communication instructions into
"responding to messages" vs "scheduled tasks" sections
- send_message tool: note that scheduled task output is not sent to user
- Remove structured output (outputFormat) — not needed with current flow
- Regular output is sent to WhatsApp; scheduled task output is only logged
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: ignore dynamic group data while preserving base structure
Only track groups/main/CLAUDE.md and groups/global/CLAUDE.md. All other
group directories and files are ignored to prevent tracking user-specific
session data.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve critical bugs in streaming container mode
Bug 1 (scheduled task hang): Task scheduler now passes onOutput callback
with idle timer that writes _close sentinel after IDLE_TIMEOUT, so
containers exit cleanly instead of blocking queue slots for 30 minutes.
Scheduled tasks stay alive for interactive follow-up via IPC.
Bug 2 (timeout disabled): Remove resetTimeout() from stderr handler.
SDK writes debug logs continuously, resetting the timer on every line.
Timeout now only resets on actual output markers in stdout.
Bug 3 (trigger bypass): Piped messages in startMessageLoop now check
trigger pattern for non-main groups. Non-trigger messages accumulate in
DB and are pulled as context via getMessagesSince when a trigger arrives.
Bug 7 (non-atomic IPC writes): GroupQueue.sendMessage uses temp file +
rename for atomic writes, matching ipc-mcp-stdio.ts pattern.
Also: flip isVerbose back to false (debug leftover), add isScheduledTask
to host-side ContainerInput interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: idle timer not starting + scheduled task groupFolder missing
Two bugs that prevented the scheduled task idle timeout fix from working:
1. onOutput was only called when parsed.result !== null, but session
update markers have result: null. The idle timer never started for
"silent" query completions, leaving containers parked at
waitForIpcMessage until hard timeout.
2. Scheduler's onProcess callback didn't pass groupFolder to
queue.registerProcess, so closeStdin no-oped (groupFolder was null).
The _close sentinel was never written even when the idle timer fired.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: duplicate messages and timestamp rollback in piping path
Two bugs introduced by the trigger context accumulation change:
1. processGroupMessages didn't advance lastAgentTimestamp until after
the container finished. The piping path's getMessagesSince(lastAgent
Timestamp) re-fetched messages already sent as the initial prompt,
causing duplicates.
2. processGroupMessages overwrote lastAgentTimestamp with the original
batch timestamp on completion, rolling back any advancement made by
the piping path while the container was running.
Fix: advance lastAgentTimestamp immediately after building the prompt,
before starting the container. This matches the piping path behavior
and eliminates both the overlap and the rollback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: container idles 30 extra minutes after _close during query
When _close was detected during pollIpcDuringQuery, it was consumed
(deleted) and stream.end() was called. But after runQuery returned,
main() still emitted a session-update marker (resetting the host's idle
timer) and called waitForIpcMessage (which polled forever since _close
was already gone). The container had to wait for a second _close.
Fix: runQuery now returns closedDuringQuery. When true, main() skips
the session-update marker and waitForIpcMessage, exiting immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resume branching, internal tags, and output forwarding
- Fix resume branching: pass resumeSessionAt with last assistant UUID
to anchor each query loop resume to the correct conversation tree
position. Prevents agent responses landing on invisible branches
when agent teams subagents create parallel JSONL entries.
- Add <internal> tag stripping: agent can wrap internal reasoning in
<internal> tags which are logged but not sent to WhatsApp. Prevents
duplicate messages and internal monologue reaching users.
- Forward scheduled task output: scheduled tasks now send result text
to WhatsApp (with <internal> stripping), matching regular message
behavior. No more special-case instructions.
- Update Communication guidance in CLAUDE.md: simplified to "your
output is sent to the user or group" with soft guidance on
<internal> tags and send_message usage.
- Add messaging behavior docs to schedule_task tool: prompts the
scheduling agent to include guidance on whether the task should
always/conditionally/never message the user.
- Mount security: containerPath now optional, defaults to basename
of hostPath.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: cursor rollback on error, flush guard, verbose logging
- Roll back lastAgentTimestamp on container error so retries can
re-process the messages instead of silently losing them.
- Add guard flag to flushOutgoingQueue to prevent duplicate sends
from concurrent flushes during rapid WA reconnects.
- Revert isVerbose from hardcoded false back to env-based check
(LOG_LEVEL=debug|trace).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: orphan container cleanup was silently failing
The startup cleanup used `container ls --format {{.Names}}` which is
Docker Go-template syntax. Apple Container only supports `--format json`
or `--format table`. The command errored with exit code 64, but the
catch block silently swallowed it — orphan containers were never cleaned
up on restart.
Fixed to use `--format json` and parse `configuration.id` from the
JSON output. Also filters by `status: running` and logs a warning on
failure instead of silently catching.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add Discord badge and community section
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: idle timer reset on null results and flush queue message loss
- Only reset idle timer on actual results (non-null), not session-update
markers. Prevents containers staying alive 30 extra minutes after the
agent finishes work.
- flushOutgoingQueue now uses shift() instead of splice(0) so unattempted
messages stay in the queue if an unexpected error bails the loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add Agent Swarms to README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update Telegram skill for current architecture
Rewrite integration instructions to match the per-group queue/SQLite
architecture: remove onMessage callback pattern (store to DB, let
message loop pick up), fix startSchedulerLoop signature, add
TELEGRAM_ONLY service startup, SQLite registration, data/env/env sync,
@mention-to-trigger translation, and BotFather group privacy docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: Telegram skill message chunking, media placeholders, chat discovery
- Split long messages at Telegram's 4096 char limit to prevent silent
send failures
- Store placeholder text for non-text messages (photos, voice, stickers,
etc.) so the agent knows media was sent
- Update getAvailableGroups filter to include tg: chats so the agent can
discover and register Telegram chats via IPC
- Fix removal step numbering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update REQUIREMENTS.md and SPEC.md for SQLite architecture
- Replace all registered_groups.json / sessions.json / router_state.json
references with SQLite equivalents
- Fix CONTAINER_TIMEOUT default (300000 → 1800000)
- Add missing config exports (IDLE_TIMEOUT, MAX_CONCURRENT_CONTAINERS)
- Update folder structure: add missing src files (logger, group-queue,
mount-security), remove non-existent utils.ts, list all skills
- Fix agent-runner entry (ipc-mcp.ts → ipc-mcp-stdio.ts)
- Update startup sequence to reflect per-group queue architecture
- Fix env mounting description (data/env/env, not extracted vars)
- Update troubleshooting to use sqlite3 commands
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: fix README architecture description, revert SPEC.md env error
- README: update architecture blurb to mention per-group queue, add
group-queue.ts to key files, update file descriptions
- SPEC.md: restore correct credential filtering description (only auth
vars are extracted from .env, not the full file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
110
docs/SPEC.md
110
docs/SPEC.md
@@ -98,11 +98,13 @@ nanoclaw/
|
||||
├── .gitignore
|
||||
│
|
||||
├── src/
|
||||
│ ├── index.ts # Main application (WhatsApp + routing)
|
||||
│ ├── index.ts # Main application (WhatsApp + routing + message loop)
|
||||
│ ├── config.ts # Configuration constants
|
||||
│ ├── types.ts # TypeScript interfaces
|
||||
│ ├── utils.ts # Generic utility functions
|
||||
│ ├── db.ts # Database initialization and queries
|
||||
│ ├── logger.ts # Pino logger setup
|
||||
│ ├── db.ts # SQLite database initialization and queries
|
||||
│ ├── group-queue.ts # Per-group queue with global concurrency limit
|
||||
│ ├── mount-security.ts # Mount allowlist validation for containers
|
||||
│ ├── whatsapp-auth.ts # Standalone WhatsApp authentication
|
||||
│ ├── task-scheduler.ts # Runs scheduled tasks when due
|
||||
│ └── container-runner.ts # Spawns agents in Apple Containers
|
||||
@@ -114,8 +116,8 @@ nanoclaw/
|
||||
│ │ ├── package.json
|
||||
│ │ ├── tsconfig.json
|
||||
│ │ └── src/
|
||||
│ │ ├── index.ts # Entry point (reads JSON, runs agent)
|
||||
│ │ └── ipc-mcp.ts # MCP server for host communication
|
||||
│ │ ├── index.ts # Entry point (query loop, IPC polling, session resume)
|
||||
│ │ └── ipc-mcp-stdio.ts # Stdio-based MCP server for host communication
|
||||
│ └── skills/
|
||||
│ └── agent-browser.md # Browser automation skill
|
||||
│
|
||||
@@ -123,12 +125,15 @@ nanoclaw/
|
||||
│
|
||||
├── .claude/
|
||||
│ └── skills/
|
||||
│ ├── setup/
|
||||
│ │ └── SKILL.md # /setup skill
|
||||
│ ├── customize/
|
||||
│ │ └── SKILL.md # /customize skill
|
||||
│ └── debug/
|
||||
│ └── SKILL.md # /debug skill (container debugging)
|
||||
│ ├── setup/SKILL.md # /setup - First-time installation
|
||||
│ ├── customize/SKILL.md # /customize - Add capabilities
|
||||
│ ├── debug/SKILL.md # /debug - Container debugging
|
||||
│ ├── add-telegram/SKILL.md # /add-telegram - Telegram channel
|
||||
│ ├── add-gmail/SKILL.md # /add-gmail - Gmail integration
|
||||
│ ├── add-voice-transcription/ # /add-voice-transcription - Whisper
|
||||
│ ├── x-integration/SKILL.md # /x-integration - X/Twitter
|
||||
│ ├── convert-to-docker/SKILL.md # /convert-to-docker - Docker runtime
|
||||
│ └── add-parallel/SKILL.md # /add-parallel - Parallel agents
|
||||
│
|
||||
├── groups/
|
||||
│ ├── CLAUDE.md # Global memory (all groups read this)
|
||||
@@ -142,12 +147,10 @@ nanoclaw/
|
||||
│
|
||||
├── store/ # Local data (gitignored)
|
||||
│ ├── auth/ # WhatsApp authentication state
|
||||
│ └── messages.db # SQLite database (messages, scheduled_tasks, task_run_logs)
|
||||
│ └── messages.db # SQLite database (messages, chats, scheduled_tasks, task_run_logs, registered_groups, sessions, router_state)
|
||||
│
|
||||
├── data/ # Application state (gitignored)
|
||||
│ ├── sessions.json # Active session IDs per group
|
||||
│ ├── registered_groups.json # Group JID → folder mapping
|
||||
│ ├── router_state.json # Last processed timestamp + last agent timestamps
|
||||
│ ├── sessions/ # Per-group session data (.claude/ dirs with JSONL transcripts)
|
||||
│ ├── env/env # Copy of .env for container mounting
|
||||
│ └── ipc/ # Container IPC (messages/, tasks/)
|
||||
│
|
||||
@@ -181,8 +184,10 @@ export const DATA_DIR = path.resolve(PROJECT_ROOT, 'data');
|
||||
|
||||
// Container configuration
|
||||
export const CONTAINER_IMAGE = process.env.CONTAINER_IMAGE || 'nanoclaw-agent:latest';
|
||||
export const CONTAINER_TIMEOUT = parseInt(process.env.CONTAINER_TIMEOUT || '300000', 10);
|
||||
export const CONTAINER_TIMEOUT = parseInt(process.env.CONTAINER_TIMEOUT || '1800000', 10); // 30min default
|
||||
export const IPC_POLL_INTERVAL = 1000;
|
||||
export const IDLE_TIMEOUT = parseInt(process.env.IDLE_TIMEOUT || '1800000', 10); // 30min — keep container alive after last result
|
||||
export const MAX_CONCURRENT_CONTAINERS = Math.max(1, parseInt(process.env.MAX_CONCURRENT_CONTAINERS || '5', 10) || 5);
|
||||
|
||||
export const TRIGGER_PATTERN = new RegExp(`^@${ASSISTANT_NAME}\\b`, 'i');
|
||||
```
|
||||
@@ -191,27 +196,25 @@ export const TRIGGER_PATTERN = new RegExp(`^@${ASSISTANT_NAME}\\b`, 'i');
|
||||
|
||||
### Container Configuration
|
||||
|
||||
Groups can have additional directories mounted via `containerConfig` in `data/registered_groups.json`:
|
||||
Groups can have additional directories mounted via `containerConfig` in the SQLite `registered_groups` table (stored as JSON in the `container_config` column). Example registration:
|
||||
|
||||
```json
|
||||
{
|
||||
"1234567890@g.us": {
|
||||
"name": "Dev Team",
|
||||
"folder": "dev-team",
|
||||
"trigger": "@Andy",
|
||||
"added_at": "2026-01-31T12:00:00Z",
|
||||
"containerConfig": {
|
||||
"additionalMounts": [
|
||||
{
|
||||
"hostPath": "~/projects/webapp",
|
||||
"containerPath": "webapp",
|
||||
"readonly": false
|
||||
}
|
||||
],
|
||||
"timeout": 600000
|
||||
}
|
||||
}
|
||||
}
|
||||
```typescript
|
||||
registerGroup("1234567890@g.us", {
|
||||
name: "Dev Team",
|
||||
folder: "dev-team",
|
||||
trigger: "@Andy",
|
||||
added_at: new Date().toISOString(),
|
||||
containerConfig: {
|
||||
additionalMounts: [
|
||||
{
|
||||
hostPath: "~/projects/webapp",
|
||||
containerPath: "webapp",
|
||||
readonly: false,
|
||||
},
|
||||
],
|
||||
timeout: 600000,
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
Additional mounts appear at `/workspace/extra/{containerPath}` inside the container.
|
||||
@@ -233,7 +236,7 @@ The token can be extracted from `~/.claude/.credentials.json` if you're logged i
|
||||
ANTHROPIC_API_KEY=sk-ant-api03-...
|
||||
```
|
||||
|
||||
Only the authentication variables (`CLAUDE_CODE_OAUTH_TOKEN` and `ANTHROPIC_API_KEY`) are extracted from `.env` and mounted into the container at `/workspace/env-dir/env`, then sourced by the entrypoint script. This ensures other environment variables in `.env` are not exposed to the agent. This workaround is needed because Apple Container loses `-e` environment variables when using `-i` (interactive mode with piped stdin).
|
||||
Only the authentication variables (`CLAUDE_CODE_OAUTH_TOKEN` and `ANTHROPIC_API_KEY`) are extracted from `.env` and written to `data/env/env`, then mounted into the container at `/workspace/env-dir/env` and sourced by the entrypoint script. This ensures other environment variables in `.env` are not exposed to the agent. This workaround is needed because Apple Container loses `-e` environment variables when using `-i` (interactive mode with piped stdin).
|
||||
|
||||
### Changing the Assistant Name
|
||||
|
||||
@@ -295,17 +298,10 @@ Sessions enable conversation continuity - Claude remembers what you talked about
|
||||
|
||||
### How Sessions Work
|
||||
|
||||
1. Each group has a session ID stored in `data/sessions.json`
|
||||
1. Each group has a session ID stored in SQLite (`sessions` table, keyed by `group_folder`)
|
||||
2. Session ID is passed to Claude Agent SDK's `resume` option
|
||||
3. Claude continues the conversation with full context
|
||||
|
||||
**data/sessions.json:**
|
||||
```json
|
||||
{
|
||||
"main": "session-abc123",
|
||||
"Family Chat": "session-def456"
|
||||
}
|
||||
```
|
||||
4. Session transcripts are stored as JSONL files in `data/sessions/{group}/.claude/`
|
||||
|
||||
---
|
||||
|
||||
@@ -327,8 +323,8 @@ Sessions enable conversation continuity - Claude remembers what you talked about
|
||||
│
|
||||
▼
|
||||
5. Router checks:
|
||||
├── Is chat_jid in registered_groups.json? → No: ignore
|
||||
└── Does message start with @Assistant? → No: ignore
|
||||
├── Is chat_jid in registered groups (SQLite)? → No: ignore
|
||||
└── Does message match trigger pattern? → No: store but don't process
|
||||
│
|
||||
▼
|
||||
6. Router catches up conversation:
|
||||
@@ -484,13 +480,15 @@ NanoClaw runs as a single macOS launchd service.
|
||||
### Startup Sequence
|
||||
|
||||
When NanoClaw starts, it:
|
||||
1. **Ensures Apple Container system is running** - Automatically starts it if needed (survives reboots)
|
||||
2. Initializes the SQLite database
|
||||
3. Loads state (registered groups, sessions, router state)
|
||||
4. Connects to WhatsApp
|
||||
5. Starts the message polling loop
|
||||
6. Starts the scheduler loop
|
||||
7. Starts the IPC watcher for container messages
|
||||
1. **Ensures Apple Container system is running** - Automatically starts it if needed; kills orphaned NanoClaw containers from previous runs
|
||||
2. Initializes the SQLite database (migrates from JSON files if they exist)
|
||||
3. Loads state from SQLite (registered groups, sessions, router state)
|
||||
4. Connects to WhatsApp (on `connection.open`):
|
||||
- Starts the scheduler loop
|
||||
- Starts the IPC watcher for container messages
|
||||
- Sets up the per-group queue with `processGroupMessages`
|
||||
- Recovers any unprocessed messages from before shutdown
|
||||
- Starts the message polling loop
|
||||
|
||||
### Service: com.nanoclaw
|
||||
|
||||
@@ -605,7 +603,7 @@ chmod 700 groups/
|
||||
| No response to messages | Service not running | Check `launchctl list | grep nanoclaw` |
|
||||
| "Claude Code process exited with code 1" | Apple Container failed to start | Check logs; NanoClaw auto-starts container system but may fail |
|
||||
| "Claude Code process exited with code 1" | Session mount path wrong | Ensure mount is to `/home/node/.claude/` not `/root/.claude/` |
|
||||
| Session not continuing | Session ID not saved | Check `data/sessions.json` |
|
||||
| Session not continuing | Session ID not saved | Check SQLite: `sqlite3 store/messages.db "SELECT * FROM sessions"` |
|
||||
| Session not continuing | Mount path mismatch | Container user is `node` with HOME=/home/node; sessions must be at `/home/node/.claude/` |
|
||||
| "QR code expired" | WhatsApp session expired | Delete store/auth/ and restart |
|
||||
| "No groups registered" | Haven't added groups | Use `@Andy add group "Name"` in main |
|
||||
|
||||
Reference in New Issue
Block a user