diff --git a/.claude/skills/debug/SKILL.md b/.claude/skills/debug/SKILL.md new file mode 100644 index 0000000..a242ed7 --- /dev/null +++ b/.claude/skills/debug/SKILL.md @@ -0,0 +1,281 @@ +--- +name: debug +description: Debug container agent issues. Use when things aren't working, container fails, authentication problems, or to understand how the container system works. Covers logs, environment variables, mounts, and common issues. +--- + +# NanoClaw Container Debugging + +This guide covers debugging the containerized agent execution system. + +## Architecture Overview + +``` +Host (macOS) Container (Linux VM) +───────────────────────────────────────────────────────────── +src/container-runner.ts container/agent-runner/ + │ │ + │ spawns Apple Container │ runs Claude Agent SDK + │ with volume mounts │ with MCP servers + │ │ + ├── data/env/env ──────────────> /workspace/env-dir/env + ├── groups/{folder} ───────────> /workspace/group + ├── data/ipc ──────────────────> /workspace/ipc + └── (main only) project root ──> /workspace/project +``` + +## Log Locations + +| Log | Location | Content | +|-----|----------|---------| +| **Main app logs** | `logs/nanoclaw.log` | Host-side WhatsApp, routing, container spawning | +| **Main app errors** | `logs/nanoclaw.error.log` | Host-side errors | +| **Container run logs** | `groups/{folder}/logs/container-*.log` | Per-run: input, mounts, stderr, stdout | +| **Claude sessions** | `~/.claude/projects/` | Claude Code session history | + +## Enabling Debug Logging + +Set `LOG_LEVEL=debug` for verbose output: + +```bash +# For development +LOG_LEVEL=debug npm run dev + +# For launchd service, add to plist EnvironmentVariables: +LOG_LEVEL +debug +``` + +Debug level shows: +- Full mount configurations +- Container command arguments +- Real-time container stderr + +## Common Issues + +### 1. "Claude Code process exited with code 1" + +**Check the container log file** in `groups/{folder}/logs/container-*.log` + +Common causes: + +#### Missing API Key +``` +Invalid API key · Please run /login +``` +**Fix:** Ensure `.env` file exists in project root with valid `ANTHROPIC_API_KEY`: +```bash +cat .env # Should show: ANTHROPIC_API_KEY=sk-ant-... +``` + +#### Root User Restriction +``` +--dangerously-skip-permissions cannot be used with root/sudo privileges +``` +**Fix:** Container must run as non-root user. Check Dockerfile has `USER node`. + +### 2. Environment Variables Not Passing + +**Apple Container Bug:** Environment variables passed via `-e` are lost when using `-i` (interactive/piped stdin). + +**Workaround:** The system mounts `.env` as a file and sources it inside the container. + +To verify env vars are reaching the container: +```bash +echo '{}' | container run -i \ + --mount type=bind,source=$(pwd)/data/env,target=/workspace/env-dir,readonly \ + --entrypoint /bin/bash nanoclaw-agent:latest \ + -c 'export $(cat /workspace/env-dir/env | xargs); echo "API key length: ${#ANTHROPIC_API_KEY}"' +``` + +### 3. Mount Issues + +**Apple Container quirks:** +- Only mounts directories, not individual files +- `-v` syntax does NOT support `:ro` suffix - use `--mount` for readonly: + ```bash + # Readonly: use --mount + --mount "type=bind,source=/path,target=/container/path,readonly" + + # Read-write: use -v + -v /path:/container/path + ``` + +To check what's mounted inside a container: +```bash +container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c 'ls -la /workspace/' +``` + +Expected structure: +``` +/workspace/ +├── env-dir/env # Environment file (ANTHROPIC_API_KEY) +├── group/ # Current group folder (cwd) +├── project/ # Project root (main channel only) +├── global/ # Global CLAUDE.md (non-main only) +├── ipc/ # Inter-process communication +│ ├── messages/ # Outgoing WhatsApp messages +│ └── tasks/ # Scheduled task commands +└── extra/ # Additional custom mounts +``` + +### 4. Permission Issues + +The container runs as user `node` (uid 1000). Check ownership: +```bash +container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c ' + whoami + ls -la /workspace/ + ls -la /app/ +' +``` + +All of `/workspace/` and `/app/` should be owned by `node`. + +### 5. MCP Server Failures + +If an MCP server fails to start, the agent may exit. Test MCP servers individually: + +```bash +# Test Gmail MCP +container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c ' + npx -y @gongrzhe/server-gmail-autoauth-mcp --help +' +``` + +## Manual Container Testing + +### Test the full agent flow: +```bash +# Set up env file +mkdir -p data/env groups/test +cp .env data/env/env + +# Run test query +echo '{"prompt":"What is 2+2?","groupFolder":"test","chatJid":"test@g.us","isMain":false}' | \ + container run -i \ + --mount "type=bind,source=$(pwd)/data/env,target=/workspace/env-dir,readonly" \ + -v $(pwd)/groups/test:/workspace/group \ + -v $(pwd)/data/ipc:/workspace/ipc \ + nanoclaw-agent:latest +``` + +### Test Claude Code directly: +```bash +container run --rm --entrypoint /bin/bash \ + --mount "type=bind,source=$(pwd)/data/env,target=/workspace/env-dir,readonly" \ + nanoclaw-agent:latest -c ' + export $(cat /workspace/env-dir/env | xargs) + claude -p "Say hello" --dangerously-skip-permissions --allowedTools "" +' +``` + +### Interactive shell in container: +```bash +container run --rm -it --entrypoint /bin/bash nanoclaw-agent:latest +``` + +## SDK Options Reference + +The agent-runner uses these Claude Agent SDK options: + +```typescript +query({ + prompt: input.prompt, + options: { + cwd: '/workspace/group', + allowedTools: ['Bash', 'Read', 'Write', ...], + permissionMode: 'bypassPermissions', + allowDangerouslySkipPermissions: true, // Required with bypassPermissions + settingSources: ['project'], + mcpServers: { ... } + } +}) +``` + +**Important:** `allowDangerouslySkipPermissions: true` is required when using `permissionMode: 'bypassPermissions'`. Without it, Claude Code exits with code 1. + +## Rebuilding After Changes + +```bash +# Rebuild main app +npm run build + +# Rebuild container (use --no-cache for clean rebuild) +./container/build.sh + +# Or force full rebuild +container builder prune -af +./container/build.sh +``` + +## Checking Container Image + +```bash +# List images +container images + +# Check what's in the image +container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c ' + echo "=== Node version ===" + node --version + + echo "=== Claude Code version ===" + claude --version + + echo "=== Installed packages ===" + ls /app/node_modules/ +' +``` + +## Session Persistence + +Claude sessions are stored in `~/.claude/` which is mounted into the container. To clear sessions: + +```bash +# Clear all sessions +rm -rf ~/.claude/projects/ + +# Clear sessions for a specific group +rm -rf ~/.claude/projects/*workspace-group*/ +``` + +## IPC Debugging + +The container communicates back to the host via files in `/workspace/ipc/`: + +```bash +# Check pending messages +ls -la data/ipc/messages/ + +# Check pending task operations +ls -la data/ipc/tasks/ + +# Read a specific IPC file +cat data/ipc/messages/*.json +``` + +## Quick Diagnostic Script + +Run this to check common issues: + +```bash +echo "=== Checking NanoClaw Container Setup ===" + +echo -e "\n1. API Key configured?" +[ -f .env ] && grep -q "ANTHROPIC_API_KEY=sk-" .env && echo "OK" || echo "MISSING - create .env with ANTHROPIC_API_KEY" + +echo -e "\n2. Env file copied for container?" +[ -f data/env/env ] && echo "OK" || echo "MISSING - will be created on first run" + +echo -e "\n3. Container image exists?" +container images 2>/dev/null | grep -q nanoclaw-agent && echo "OK" || echo "MISSING - run ./container/build.sh" + +echo -e "\n4. Apple Container running?" +container system info &>/dev/null && echo "OK" || echo "NOT RUNNING - run: container system start" + +echo -e "\n5. Groups directory?" +ls -la groups/ 2>/dev/null || echo "MISSING - run setup" + +echo -e "\n6. Recent container logs?" +ls -t groups/*/logs/container-*.log 2>/dev/null | head -3 || echo "No container logs yet" +``` diff --git a/.claude/skills/setup/SKILL.md b/.claude/skills/setup/SKILL.md index 0301d6e..20dd310 100644 --- a/.claude/skills/setup/SKILL.md +++ b/.claude/skills/setup/SKILL.md @@ -37,7 +37,47 @@ container system start 2>/dev/null || true container --version ``` -## 3. Build Container Image +## 3. Configure API Key + +Ask the user: +> Do you have an Anthropic API key configured elsewhere that I should copy, or should I create a `.env` file for you to fill in? + +**If copying from another location:** +```bash +# Extract only the ANTHROPIC_API_KEY line from the source file +grep "^ANTHROPIC_API_KEY=" /path/to/other/.env > .env +``` + +Verify the key exists (only show first/last few chars for security): +```bash +KEY=$(grep "^ANTHROPIC_API_KEY=" .env | cut -d= -f2) +if [ -n "$KEY" ]; then + echo "API key configured: ${KEY:0:10}...${KEY: -4}" +else + echo "API key missing or invalid" +fi +``` + +**If creating new:** +```bash +echo 'ANTHROPIC_API_KEY=' > .env +``` + +Tell the user: +> I've created `.env` in the project root. Please add your Anthropic API key after the `=` sign. +> You can get an API key from https://console.anthropic.com/ + +Wait for user confirmation, then verify (only show first/last few chars): +```bash +KEY=$(grep "^ANTHROPIC_API_KEY=" .env | cut -d= -f2) +if [ -n "$KEY" ]; then + echo "API key configured: ${KEY:0:10}...${KEY: -4}" +else + echo "API key missing or invalid" +fi +``` + +## 4. Build Container Image Build the NanoClaw agent container: @@ -45,15 +85,15 @@ Build the NanoClaw agent container: ./container/build.sh ``` -This creates the `nanoclaw-agent:latest` image with Node.js, Chromium, and agent-browser. +This creates the `nanoclaw-agent:latest` image with Node.js, Chromium, Claude Code CLI, and agent-browser. -Verify the image was created: +Verify the build succeeded (the `container images` command may not work due to a plugin issue, so we verify by running a simple test): ```bash -container images | grep nanoclaw-agent || echo "Image not found" +echo '{}' | container run -i --entrypoint /bin/echo nanoclaw-agent:latest "Container OK" || echo "Container build failed" ``` -## 4. WhatsApp Authentication +## 5. WhatsApp Authentication **USER ACTION REQUIRED** @@ -73,7 +113,7 @@ Wait for the script to output "Successfully authenticated" then continue. If it says "Already authenticated", skip to the next step. -## 5. Configure Assistant Name +## 6. Configure Assistant Name Ask the user: > What trigger word do you want to use? (default: `Andy`) @@ -82,7 +122,7 @@ Ask the user: Store their choice - you'll use it when creating the registered_groups.json and when telling them how to test. -## 6. Register Main Channel +## 7. Register Main Channel Ask the user: > Do you want to use your **personal chat** (message yourself) or a **WhatsApp group** as your main control channel? @@ -126,7 +166,7 @@ Ensure the groups folder exists: mkdir -p groups/main/logs ``` -## 7. Gmail Authentication (Optional) +## 8. Gmail Authentication (Optional) Ask the user: > Do you want to enable Gmail integration for reading/sending emails? @@ -153,7 +193,7 @@ npx -y @gongrzhe/server-gmail-autoauth-mcp This will open a browser for OAuth consent. After authorization, credentials are cached. -## 8. Configure launchd Service +## 9. Configure launchd Service Get the actual paths: @@ -212,7 +252,7 @@ Verify it's running: launchctl list | grep nanoclaw ``` -## 9. Test +## 10. Test Tell the user (using the assistant name they configured): > Send `@ASSISTANT_NAME hello` in your registered chat. diff --git a/SPEC.md b/SPEC.md index 5f9d7e1..cce7617 100644 --- a/SPEC.md +++ b/SPEC.md @@ -54,7 +54,7 @@ A personal Claude assistant accessible via WhatsApp, with persistent memory per │ │ Volume mounts: │ │ │ │ • groups/{name}/ → /workspace/group │ │ │ │ • groups/CLAUDE.md → /workspace/global/CLAUDE.md │ │ -│ │ • ~/.claude/ → /root/.claude/ (sessions) │ │ +│ │ • ~/.claude/ → /home/node/.claude/ (sessions) │ │ │ │ • Additional dirs → /workspace/extra/* │ │ │ │ │ │ │ │ Tools (all groups): │ │ @@ -76,7 +76,7 @@ A personal Claude assistant accessible via WhatsApp, with persistent memory per | WhatsApp Connection | Node.js (@whiskeysockets/baileys) | Connect to WhatsApp, send/receive messages | | Message Storage | SQLite (better-sqlite3) | Store messages for polling | | Container Runtime | Apple Container | Isolated Linux VMs for agent execution | -| Agent | @anthropic-ai/claude-agent-sdk | Run Claude with tools and MCP servers | +| Agent | @anthropic-ai/claude-agent-sdk (0.2.29) | Run Claude with tools and MCP servers | | Browser Automation | agent-browser + Chromium | Web interaction and screenshots | | Runtime | Node.js 22+ | Host process for routing and scheduling | @@ -104,7 +104,7 @@ nanoclaw/ │ └── container-runner.ts # Spawns agents in Apple Containers │ ├── container/ -│ ├── Dockerfile # Container image definition +│ ├── Dockerfile # Container image (runs as 'node' user, includes Claude Code CLI) │ ├── build.sh # Build script for container image │ ├── agent-runner/ # Code that runs inside the container │ │ ├── package.json @@ -121,8 +121,10 @@ nanoclaw/ │ └── skills/ │ ├── setup/ │ │ └── SKILL.md # /setup skill -│ └── customize/ -│ └── SKILL.md # /customize skill +│ ├── customize/ +│ │ └── SKILL.md # /customize skill +│ └── debug/ +│ └── SKILL.md # /debug skill (container debugging) │ ├── groups/ │ ├── CLAUDE.md # Global memory (all groups read this) @@ -142,11 +144,14 @@ nanoclaw/ │ ├── sessions.json # Active session IDs per group │ ├── archived_sessions.json # Old sessions after /clear │ ├── registered_groups.json # Group JID → folder mapping -│ └── router_state.json # Last processed timestamp + last agent timestamps +│ ├── router_state.json # Last processed timestamp + last agent timestamps +│ ├── env/env # Copy of .env for container mounting +│ └── ipc/ # Container IPC (messages/, tasks/) │ ├── logs/ # Runtime logs (gitignored) -│ ├── nanoclaw.log # stdout -│ └── nanoclaw.error.log # stderr +│ ├── nanoclaw.log # Host stdout +│ └── nanoclaw.error.log # Host stderr +│ # Note: Per-container logs are in groups/{folder}/logs/container-*.log │ └── launchd/ └── com.nanoclaw.plist # macOS service configuration @@ -202,6 +207,18 @@ Groups can have additional directories mounted via `containerConfig` in `data/re Additional mounts appear at `/workspace/extra/{containerPath}` inside the container. +**Apple Container mount syntax note:** Read-write mounts use `-v host:container`, but readonly mounts require `--mount "type=bind,source=...,target=...,readonly"` (the `:ro` suffix doesn't work). + +### API Key Configuration + +The Anthropic API key must be in a `.env` file in the project root: + +```bash +ANTHROPIC_API_KEY=sk-ant-... +``` + +This file is automatically mounted into the container at `/workspace/env-dir/env` and sourced by the entrypoint script. This workaround is needed because Apple Container loses `-e` environment variables when using `-i` (interactive mode with piped stdin). + ### Changing the Assistant Name Set the `ASSISTANT_NAME` environment variable: @@ -540,6 +557,7 @@ All agents run inside Apple Container (lightweight Linux VMs), providing: - **Safe Bash access**: Commands run inside the container, not on your Mac - **Network isolation**: Can be configured per-container if needed - **Process isolation**: Container processes can't affect the host +- **Non-root user**: Container runs as unprivileged `node` user (uid 1000) ### Prompt Injection Risk @@ -563,7 +581,7 @@ WhatsApp messages could contain malicious instructions attempting to manipulate | Credential | Storage Location | Notes | |------------|------------------|-------| -| Claude CLI Auth | ~/.claude/ | Managed by Claude Code CLI | +| Claude CLI Auth | ~/.claude/ | Mounted to /home/node/.claude/ in container | | WhatsApp Session | store/auth/ | Auto-created, persists ~20 days | | Gmail OAuth Tokens | ~/.gmail-mcp/ | Created during setup (optional) | diff --git a/container/Dockerfile b/container/Dockerfile index 6b844d0..d8c6b4e 100644 --- a/container/Dockerfile +++ b/container/Dockerfile @@ -29,8 +29,8 @@ RUN apt-get update && apt-get install -y \ ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium -# Install agent-browser globally -RUN npm install -g agent-browser +# Install agent-browser and claude-code globally +RUN npm install -g agent-browser @anthropic-ai/claude-code # Create app directory WORKDIR /app @@ -50,8 +50,18 @@ RUN npm run build # Create workspace directories RUN mkdir -p /workspace/group /workspace/global /workspace/extra /workspace/ipc/messages /workspace/ipc/tasks +# Create entrypoint script +# Sources env from mounted /workspace/env-dir/env if it exists (workaround for Apple Container -i bug) +RUN printf '#!/bin/bash\nset -e\n[ -f /workspace/env-dir/env ] && export $(cat /workspace/env-dir/env | xargs)\ncat > /tmp/input.json\nnode /app/dist/index.js < /tmp/input.json\n' > /app/entrypoint.sh && chmod +x /app/entrypoint.sh + +# Set ownership to node user (non-root) for writable directories +RUN chown -R node:node /workspace + +# Switch to non-root user (required for --dangerously-skip-permissions) +USER node + # Set working directory to group workspace WORKDIR /workspace/group # Entry point reads JSON from stdin, outputs JSON to stdout -ENTRYPOINT ["node", "/app/dist/index.js"] +ENTRYPOINT ["/app/entrypoint.sh"] diff --git a/container/agent-runner/package.json b/container/agent-runner/package.json index 5a7df6a..05f8a46 100644 --- a/container/agent-runner/package.json +++ b/container/agent-runner/package.json @@ -9,8 +9,8 @@ "start": "node dist/index.js" }, "dependencies": { - "@anthropic-ai/claude-agent-sdk": "^0.1.9", - "zod": "^3.24.2" + "@anthropic-ai/claude-agent-sdk": "0.2.29", + "zod": "^4.0.0" }, "devDependencies": { "@types/node": "^22.10.7", diff --git a/container/agent-runner/src/index.ts b/container/agent-runner/src/index.ts index 88c1ab3..641da21 100644 --- a/container/agent-runner/src/index.ts +++ b/container/agent-runner/src/index.ts @@ -83,6 +83,7 @@ async function main(): Promise { 'mcp__gmail__*' ], permissionMode: 'bypassPermissions', + allowDangerouslySkipPermissions: true, settingSources: ['project'], mcpServers: { nanoclaw: ipcMcp, diff --git a/src/container-runner.ts b/src/container-runner.ts index bf47ba1..cd22e89 100644 --- a/src/container-runner.ts +++ b/src/container-runner.ts @@ -109,6 +109,20 @@ function buildVolumeMounts(group: RegisteredGroup, isMain: boolean): VolumeMount readonly: false }); + // Environment file directory (workaround for Apple Container -i env var bug) + const envDir = path.join(DATA_DIR, 'env'); + fs.mkdirSync(envDir, { recursive: true }); + const envFile = path.join(projectRoot, '.env'); + if (fs.existsSync(envFile)) { + // Copy .env to the env directory as a plain file called 'env' + fs.copyFileSync(envFile, path.join(envDir, 'env')); + mounts.push({ + hostPath: envDir, + containerPath: '/workspace/env-dir', + readonly: true + }); + } + // Additional mounts from group config if (group.containerConfig?.additionalMounts) { for (const mount of group.containerConfig.additionalMounts) { @@ -136,9 +150,13 @@ function buildContainerArgs(mounts: VolumeMount[]): string[] { const args: string[] = ['run', '-i', '--rm']; // Add volume mounts + // Apple Container: use --mount for readonly, -v for read-write for (const mount of mounts) { - const mode = mount.readonly ? ':ro' : ''; - args.push('-v', `${mount.hostPath}:${mount.containerPath}${mode}`); + if (mount.readonly) { + args.push('--mount', `type=bind,source=${mount.hostPath},target=${mount.containerPath},readonly`); + } else { + args.push('-v', `${mount.hostPath}:${mount.containerPath}`); + } } // Add the image name @@ -161,12 +179,23 @@ export async function runContainerAgent( const mounts = buildVolumeMounts(group, input.isMain); const containerArgs = buildContainerArgs(mounts); + // Log detailed mount info at debug level + logger.debug({ + group: group.name, + mounts: mounts.map(m => `${m.hostPath} -> ${m.containerPath}${m.readonly ? ' (ro)' : ''}`), + containerArgs: containerArgs.join(' ') + }, 'Container mount configuration'); + logger.info({ group: group.name, mountCount: mounts.length, isMain: input.isMain }, 'Spawning container agent'); + // Create logs directory for this group + const logsDir = path.join(GROUPS_DIR, group.folder, 'logs'); + fs.mkdirSync(logsDir, { recursive: true }); + return new Promise((resolve) => { const container = spawn('container', containerArgs, { stdio: ['pipe', 'pipe', 'pipe'] @@ -207,12 +236,42 @@ export async function runContainerAgent( clearTimeout(timeout); const duration = Date.now() - startTime; + // Always write stderr to log file for debugging + const timestamp = new Date().toISOString().replace(/[:.]/g, '-'); + const logFile = path.join(logsDir, `container-${timestamp}.log`); + const logContent = [ + `=== Container Run Log ===`, + `Timestamp: ${new Date().toISOString()}`, + `Group: ${group.name}`, + `IsMain: ${input.isMain}`, + `Duration: ${duration}ms`, + `Exit Code: ${code}`, + ``, + `=== Input ===`, + JSON.stringify(input, null, 2), + ``, + `=== Container Args ===`, + containerArgs.join(' '), + ``, + `=== Mounts ===`, + mounts.map(m => `${m.hostPath} -> ${m.containerPath}${m.readonly ? ' (ro)' : ''}`).join('\n'), + ``, + `=== Stderr ===`, + stderr, + ``, + `=== Stdout ===`, + stdout + ].join('\n'); + fs.writeFileSync(logFile, logContent); + logger.debug({ logFile }, 'Container log written'); + if (code !== 0) { logger.error({ group: group.name, code, duration, - stderr: stderr.slice(-500) + stderr: stderr.slice(-500), + logFile }, 'Container exited with error'); resolve({