Files
Regolith/docs/REQUIREMENTS.md
gavrielc 6f02ee530b Adds Agent Swarms
* feat: streaming container mode, IPC messaging, agent teams support

Major architectural shift from single-shot container runs to long-lived
streaming containers with IPC-based message injection.

- Agent runner: query loop with AsyncIterable prompt to keep stdin open
  for agent teams (fixes isSingleUserTurn premature shutdown)
- New standalone stdio MCP server (ipc-mcp-stdio.ts) inheritable by
  subagents, with send_message and schedule_task tools
- Streaming output: parse OUTPUT_START/END markers in real-time, send
  results to WhatsApp as they arrive
- IPC file-based messaging: host writes to ipc/{group}/input/, agent
  polls for follow-up messages without respawning containers
- Per-group settings.json with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
- SDK bumped to 0.2.34 for TeamCreate tool support
- Container idle timeout (30min) with _close sentinel for shutdown
- Orphaned container cleanup on startup
- alwaysRespond flag for groups that skip trigger pattern check
- Uncaught exception/rejection handlers with timestamps in logger
- Combined SDK documentation into single deep dive reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove unused ipc-mcp.ts (replaced by ipc-mcp-stdio.ts)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: clarify agent communication model in docs and tool descriptions

- CLAUDE.md (main + global): split communication instructions into
  "responding to messages" vs "scheduled tasks" sections
- send_message tool: note that scheduled task output is not sent to user
- Remove structured output (outputFormat) — not needed with current flow
- Regular output is sent to WhatsApp; scheduled task output is only logged

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: ignore dynamic group data while preserving base structure

Only track groups/main/CLAUDE.md and groups/global/CLAUDE.md. All other
group directories and files are ignored to prevent tracking user-specific
session data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve critical bugs in streaming container mode

Bug 1 (scheduled task hang): Task scheduler now passes onOutput callback
with idle timer that writes _close sentinel after IDLE_TIMEOUT, so
containers exit cleanly instead of blocking queue slots for 30 minutes.
Scheduled tasks stay alive for interactive follow-up via IPC.

Bug 2 (timeout disabled): Remove resetTimeout() from stderr handler.
SDK writes debug logs continuously, resetting the timer on every line.
Timeout now only resets on actual output markers in stdout.

Bug 3 (trigger bypass): Piped messages in startMessageLoop now check
trigger pattern for non-main groups. Non-trigger messages accumulate in
DB and are pulled as context via getMessagesSince when a trigger arrives.

Bug 7 (non-atomic IPC writes): GroupQueue.sendMessage uses temp file +
rename for atomic writes, matching ipc-mcp-stdio.ts pattern.

Also: flip isVerbose back to false (debug leftover), add isScheduledTask
to host-side ContainerInput interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: idle timer not starting + scheduled task groupFolder missing

Two bugs that prevented the scheduled task idle timeout fix from working:

1. onOutput was only called when parsed.result !== null, but session
   update markers have result: null. The idle timer never started for
   "silent" query completions, leaving containers parked at
   waitForIpcMessage until hard timeout.

2. Scheduler's onProcess callback didn't pass groupFolder to
   queue.registerProcess, so closeStdin no-oped (groupFolder was null).
   The _close sentinel was never written even when the idle timer fired.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: duplicate messages and timestamp rollback in piping path

Two bugs introduced by the trigger context accumulation change:

1. processGroupMessages didn't advance lastAgentTimestamp until after
   the container finished. The piping path's getMessagesSince(lastAgent
   Timestamp) re-fetched messages already sent as the initial prompt,
   causing duplicates.

2. processGroupMessages overwrote lastAgentTimestamp with the original
   batch timestamp on completion, rolling back any advancement made by
   the piping path while the container was running.

Fix: advance lastAgentTimestamp immediately after building the prompt,
before starting the container. This matches the piping path behavior
and eliminates both the overlap and the rollback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: container idles 30 extra minutes after _close during query

When _close was detected during pollIpcDuringQuery, it was consumed
(deleted) and stream.end() was called. But after runQuery returned,
main() still emitted a session-update marker (resetting the host's idle
timer) and called waitForIpcMessage (which polled forever since _close
was already gone). The container had to wait for a second _close.

Fix: runQuery now returns closedDuringQuery. When true, main() skips
the session-update marker and waitForIpcMessage, exiting immediately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resume branching, internal tags, and output forwarding

- Fix resume branching: pass resumeSessionAt with last assistant UUID
  to anchor each query loop resume to the correct conversation tree
  position. Prevents agent responses landing on invisible branches
  when agent teams subagents create parallel JSONL entries.

- Add <internal> tag stripping: agent can wrap internal reasoning in
  <internal> tags which are logged but not sent to WhatsApp. Prevents
  duplicate messages and internal monologue reaching users.

- Forward scheduled task output: scheduled tasks now send result text
  to WhatsApp (with <internal> stripping), matching regular message
  behavior. No more special-case instructions.

- Update Communication guidance in CLAUDE.md: simplified to "your
  output is sent to the user or group" with soft guidance on
  <internal> tags and send_message usage.

- Add messaging behavior docs to schedule_task tool: prompts the
  scheduling agent to include guidance on whether the task should
  always/conditionally/never message the user.

- Mount security: containerPath now optional, defaults to basename
  of hostPath.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cursor rollback on error, flush guard, verbose logging

- Roll back lastAgentTimestamp on container error so retries can
  re-process the messages instead of silently losing them.

- Add guard flag to flushOutgoingQueue to prevent duplicate sends
  from concurrent flushes during rapid WA reconnects.

- Revert isVerbose from hardcoded false back to env-based check
  (LOG_LEVEL=debug|trace).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: orphan container cleanup was silently failing

The startup cleanup used `container ls --format {{.Names}}` which is
Docker Go-template syntax. Apple Container only supports `--format json`
or `--format table`. The command errored with exit code 64, but the
catch block silently swallowed it — orphan containers were never cleaned
up on restart.

Fixed to use `--format json` and parse `configuration.id` from the
JSON output. Also filters by `status: running` and logs a warning on
failure instead of silently catching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add Discord badge and community section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: idle timer reset on null results and flush queue message loss

- Only reset idle timer on actual results (non-null), not session-update
  markers. Prevents containers staying alive 30 extra minutes after the
  agent finishes work.
- flushOutgoingQueue now uses shift() instead of splice(0) so unattempted
  messages stay in the queue if an unexpected error bails the loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add Agent Swarms to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update Telegram skill for current architecture

Rewrite integration instructions to match the per-group queue/SQLite
architecture: remove onMessage callback pattern (store to DB, let
message loop pick up), fix startSchedulerLoop signature, add
TELEGRAM_ONLY service startup, SQLite registration, data/env/env sync,
@mention-to-trigger translation, and BotFather group privacy docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Telegram skill message chunking, media placeholders, chat discovery

- Split long messages at Telegram's 4096 char limit to prevent silent
  send failures
- Store placeholder text for non-text messages (photos, voice, stickers,
  etc.) so the agent knows media was sent
- Update getAvailableGroups filter to include tg: chats so the agent can
  discover and register Telegram chats via IPC
- Fix removal step numbering

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update REQUIREMENTS.md and SPEC.md for SQLite architecture

- Replace all registered_groups.json / sessions.json / router_state.json
  references with SQLite equivalents
- Fix CONTAINER_TIMEOUT default (300000 → 1800000)
- Add missing config exports (IDLE_TIMEOUT, MAX_CONCURRENT_CONTAINERS)
- Update folder structure: add missing src files (logger, group-queue,
  mount-security), remove non-existent utils.ts, list all skills
- Fix agent-runner entry (ipc-mcp.ts → ipc-mcp-stdio.ts)
- Update startup sequence to reflect per-group queue architecture
- Fix env mounting description (data/env/env, not extracted vars)
- Update troubleshooting to use sqlite3 commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: fix README architecture description, revert SPEC.md env error

- README: update architecture blurb to mention per-group queue, add
  group-queue.ts to key files, update file descriptions
- SPEC.md: restore correct credential filtering description (only auth
  vars are extracted from .env, not the full file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 02:50:43 +02:00

197 lines
8.2 KiB
Markdown

# NanoClaw Requirements
Original requirements and design decisions from the project creator.
---
## Why This Exists
This is a lightweight, secure alternative to OpenClaw (formerly ClawBot). That project became a monstrosity - 4-5 different processes running different gateways, endless configuration files, endless integrations. It's a security nightmare where agents don't run in isolated processes; there's all kinds of leaky workarounds trying to prevent them from accessing parts of the system they shouldn't. It's impossible for anyone to realistically understand the whole codebase. When you run it you're kind of just yoloing it.
NanoClaw gives you the core functionality without that mess.
---
## Philosophy
### Small Enough to Understand
The entire codebase should be something you can read and understand. One Node.js process. A handful of source files. No microservices, no message queues, no abstraction layers.
### Security Through True Isolation
Instead of application-level permission systems trying to prevent agents from accessing things, agents run in actual Linux containers (Apple Container). The isolation is at the OS level. Agents can only see what's explicitly mounted. Bash access is safe because commands run inside the container, not on your Mac.
### Built for One User
This isn't a framework or a platform. It's working software for my specific needs. I use WhatsApp and Email, so it supports WhatsApp and Email. I don't use Telegram, so it doesn't support Telegram. I add the integrations I actually want, not every possible integration.
### Customization = Code Changes
No configuration sprawl. If you want different behavior, modify the code. The codebase is small enough that this is safe and practical. Very minimal things like the trigger word are in config. Everything else - just change the code to do what you want.
### AI-Native Development
I don't need an installation wizard - Claude Code guides the setup. I don't need a monitoring dashboard - I ask Claude Code what's happening. I don't need elaborate logging UIs - I ask Claude to read the logs. I don't need debugging tools - I describe the problem and Claude fixes it.
The codebase assumes you have an AI collaborator. It doesn't need to be excessively self-documenting or self-debugging because Claude is always there.
### Skills Over Features
When people contribute, they shouldn't add "Telegram support alongside WhatsApp." They should contribute a skill like `/add-telegram` that transforms the codebase. Users fork the repo, run skills to customize, and end up with clean code that does exactly what they need - not a bloated system trying to support everyone's use case simultaneously.
---
## RFS (Request for Skills)
Skills we'd love contributors to build:
### Communication Channels
Skills to add or switch to different messaging platforms:
- `/add-telegram` - Add Telegram as an input channel
- `/add-slack` - Add Slack as an input channel
- `/add-discord` - Add Discord as an input channel
- `/add-sms` - Add SMS via Twilio or similar
- `/convert-to-telegram` - Replace WhatsApp with Telegram entirely
### Container Runtime
The project currently uses Apple Container (macOS-only). We need:
- `/convert-to-docker` - Replace Apple Container with standard Docker
- This unlocks Linux support and broader deployment options
### Platform Support
- `/setup-linux` - Make the full setup work on Linux (depends on Docker conversion)
- `/setup-windows` - Windows support via WSL2 + Docker
---
## Vision
A personal Claude assistant accessible via WhatsApp, with minimal custom code.
**Core components:**
- **Claude Agent SDK** as the core agent
- **Apple Container** for isolated agent execution (Linux VMs)
- **WhatsApp** as the primary I/O channel
- **Persistent memory** per conversation and globally
- **Scheduled tasks** that run Claude and can message back
- **Web access** for search and browsing
- **Browser automation** via agent-browser
**Implementation approach:**
- Use existing tools (WhatsApp connector, Claude Agent SDK, MCP servers)
- Minimal glue code
- File-based systems where possible (CLAUDE.md for memory, folders for groups)
---
## Architecture Decisions
### Message Routing
- A router listens to WhatsApp and routes messages based on configuration
- Only messages from registered groups are processed
- Trigger: `@Andy` prefix (case insensitive), configurable via `ASSISTANT_NAME` env var
- Unregistered groups are ignored completely
### Memory System
- **Per-group memory**: Each group has a folder with its own `CLAUDE.md`
- **Global memory**: Root `CLAUDE.md` is read by all groups, but only writable from "main" (self-chat)
- **Files**: Groups can create/read files in their folder and reference them
- Agent runs in the group's folder, automatically inherits both CLAUDE.md files
### Session Management
- Each group maintains a conversation session (via Claude Agent SDK)
- Sessions auto-compact when context gets too long, preserving critical information
### Container Isolation
- All agents run inside Apple Container (lightweight Linux VMs)
- Each agent invocation spawns a container with mounted directories
- Containers provide filesystem isolation - agents can only see mounted paths
- Bash access is safe because commands run inside the container, not on the host
- Browser automation via agent-browser with Chromium in the container
### Scheduled Tasks
- Users can ask Claude to schedule recurring or one-time tasks from any group
- Tasks run as full agents in the context of the group that created them
- Tasks have access to all tools including Bash (safe in container)
- Tasks can optionally send messages to their group via `send_message` tool, or complete silently
- Task runs are logged to the database with duration and result
- Schedule types: cron expressions, intervals (ms), or one-time (ISO timestamp)
- From main: can schedule tasks for any group, view/manage all tasks
- From other groups: can only manage that group's tasks
### Group Management
- New groups are added explicitly via the main channel
- Groups are registered in SQLite (via the main channel or IPC `register_group` command)
- Each group gets a dedicated folder under `groups/`
- Groups can have additional directories mounted via `containerConfig`
### Main Channel Privileges
- Main channel is the admin/control group (typically self-chat)
- Can write to global memory (`groups/CLAUDE.md`)
- Can schedule tasks for any group
- Can view and manage tasks from all groups
- Can configure additional directory mounts for any group
---
## Integration Points
### WhatsApp
- Using baileys library for WhatsApp Web connection
- Messages stored in SQLite, polled by router
- QR code authentication during setup
### Scheduler
- Built-in scheduler runs on the host, spawns containers for task execution
- Custom `nanoclaw` MCP server (inside container) provides scheduling tools
- Tools: `schedule_task`, `list_tasks`, `pause_task`, `resume_task`, `cancel_task`, `send_message`
- Tasks stored in SQLite with run history
- Scheduler loop checks for due tasks every minute
- Tasks execute Claude Agent SDK in containerized group context
### Web Access
- Built-in WebSearch and WebFetch tools
- Standard Claude Agent SDK capabilities
### Browser Automation
- agent-browser CLI with Chromium in container
- Snapshot-based interaction with element references (@e1, @e2, etc.)
- Screenshots, PDFs, video recording
- Authentication state persistence
---
## Setup & Customization
### Philosophy
- Minimal configuration files
- Setup and customization done via Claude Code
- Users clone the repo and run Claude Code to configure
- Each user gets a custom setup matching their exact needs
### Skills
- `/setup` - Install dependencies, authenticate WhatsApp, configure scheduler, start services
- `/customize` - General-purpose skill for adding capabilities (new channels like Telegram, new integrations, behavior changes)
### Deployment
- Runs on local Mac via launchd
- Single Node.js process handles everything
---
## Personal Configuration (Reference)
These are the creator's settings, stored here for reference:
- **Trigger**: `@Andy` (case insensitive)
- **Response prefix**: `Andy:`
- **Persona**: Default Claude (no custom personality)
- **Main channel**: Self-chat (messaging yourself in WhatsApp)
---
## Project Name
**NanoClaw** - A reference to Clawdbot (now OpenClaw).