Fix container execution and add debug tooling

Container fixes: - Run as non-root 'node' user (required for --dangerously-skip-permissions) - Add allowDangerouslySkipPermissions: true to SDK options - Mount .env file to work around Apple Container -i env var bug - Use --mount for readonly, -v for read-write (Apple Container quirk) - Bump SDK to 0.2.29, zod to v4 - Install Claude Code CLI globally in container Logging improvements: - Write per-run logs to groups/{folder}/logs/container-*.log - Add debug-level logging for mounts and container args Documentation: - Add /debug skill with comprehensive troubleshooting guide - Update /setup skill with API key configuration step - Update SPEC.md with container details, mount syntax, security notes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 10:35:08 +02:00
parent 0ccdaaac48
commit 67e0295d82
7 changed files with 436 additions and 27 deletions
--- a/.claude/skills/debug/SKILL.md
+++ b/.claude/skills/debug/SKILL.md
@@ -0,0 +1,281 @@
+---
+name: debug
+description: Debug container agent issues. Use when things aren't working, container fails, authentication problems, or to understand how the container system works. Covers logs, environment variables, mounts, and common issues.
+---
+
+# NanoClaw Container Debugging
+
+This guide covers debugging the containerized agent execution system.
+
+## Architecture Overview
+
+```
+Host (macOS)                          Container (Linux VM)
+─────────────────────────────────────────────────────────────
+src/container-runner.ts               container/agent-runner/
+    │                                      │
+    │ spawns Apple Container               │ runs Claude Agent SDK
+    │ with volume mounts                   │ with MCP servers
+    │                                      │
+    ├── data/env/env ──────────────> /workspace/env-dir/env
+    ├── groups/{folder} ───────────> /workspace/group
+    ├── data/ipc ──────────────────> /workspace/ipc
+    └── (main only) project root ──> /workspace/project
+```
+
+## Log Locations
+
+| Log | Location | Content |
+|-----|----------|---------|
+| **Main app logs** | `logs/nanoclaw.log` | Host-side WhatsApp, routing, container spawning |
+| **Main app errors** | `logs/nanoclaw.error.log` | Host-side errors |
+| **Container run logs** | `groups/{folder}/logs/container-*.log` | Per-run: input, mounts, stderr, stdout |
+| **Claude sessions** | `~/.claude/projects/` | Claude Code session history |
+
+## Enabling Debug Logging
+
+Set `LOG_LEVEL=debug` for verbose output:
+
+```bash
+# For development
+LOG_LEVEL=debug npm run dev
+
+# For launchd service, add to plist EnvironmentVariables:
+<key>LOG_LEVEL</key>
+<string>debug</string>
+```
+
+Debug level shows:
+- Full mount configurations
+- Container command arguments
+- Real-time container stderr
+
+## Common Issues
+
+### 1. "Claude Code process exited with code 1"
+
+**Check the container log file** in `groups/{folder}/logs/container-*.log`
+
+Common causes:
+
+#### Missing API Key
+```
+Invalid API key · Please run /login
+```
+**Fix:** Ensure `.env` file exists in project root with valid `ANTHROPIC_API_KEY`:
+```bash
+cat .env  # Should show: ANTHROPIC_API_KEY=sk-ant-...
+```
+
+#### Root User Restriction
+```
+--dangerously-skip-permissions cannot be used with root/sudo privileges
+```
+**Fix:** Container must run as non-root user. Check Dockerfile has `USER node`.
+
+### 2. Environment Variables Not Passing
+
+**Apple Container Bug:** Environment variables passed via `-e` are lost when using `-i` (interactive/piped stdin).
+
+**Workaround:** The system mounts `.env` as a file and sources it inside the container.
+
+To verify env vars are reaching the container:
+```bash
+echo '{}' | container run -i \
+  --mount type=bind,source=$(pwd)/data/env,target=/workspace/env-dir,readonly \
+  --entrypoint /bin/bash nanoclaw-agent:latest \
+  -c 'export $(cat /workspace/env-dir/env | xargs); echo "API key length: ${#ANTHROPIC_API_KEY}"'
+```
+
+### 3. Mount Issues
+
+**Apple Container quirks:**
+- Only mounts directories, not individual files
+- `-v` syntax does NOT support `:ro` suffix - use `--mount` for readonly:
+  ```bash
+  # Readonly: use --mount
+  --mount "type=bind,source=/path,target=/container/path,readonly"
+
+  # Read-write: use -v
+  -v /path:/container/path
+  ```
+
+To check what's mounted inside a container:
+```bash
+container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c 'ls -la /workspace/'
+```
+
+Expected structure:
+```
+/workspace/
+├── env-dir/env     # Environment file (ANTHROPIC_API_KEY)
+├── group/          # Current group folder (cwd)
+├── project/        # Project root (main channel only)
+├── global/         # Global CLAUDE.md (non-main only)
+├── ipc/            # Inter-process communication
+│   ├── messages/   # Outgoing WhatsApp messages
+│   └── tasks/      # Scheduled task commands
+└── extra/          # Additional custom mounts
+```
+
+### 4. Permission Issues
+
+The container runs as user `node` (uid 1000). Check ownership:
+```bash
+container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
+  whoami
+  ls -la /workspace/
+  ls -la /app/
+'
+```
+
+All of `/workspace/` and `/app/` should be owned by `node`.
+
+### 5. MCP Server Failures
+
+If an MCP server fails to start, the agent may exit. Test MCP servers individually:
+
+```bash
+# Test Gmail MCP
+container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
+  npx -y @gongrzhe/server-gmail-autoauth-mcp --help
+'
+```
+
+## Manual Container Testing
+
+### Test the full agent flow:
+```bash
+# Set up env file
+mkdir -p data/env groups/test
+cp .env data/env/env
+
+# Run test query
+echo '{"prompt":"What is 2+2?","groupFolder":"test","chatJid":"test@g.us","isMain":false}' | \
+  container run -i \
+  --mount "type=bind,source=$(pwd)/data/env,target=/workspace/env-dir,readonly" \
+  -v $(pwd)/groups/test:/workspace/group \
+  -v $(pwd)/data/ipc:/workspace/ipc \
+  nanoclaw-agent:latest
+```
+
+### Test Claude Code directly:
+```bash
+container run --rm --entrypoint /bin/bash \
+  --mount "type=bind,source=$(pwd)/data/env,target=/workspace/env-dir,readonly" \
+  nanoclaw-agent:latest -c '
+  export $(cat /workspace/env-dir/env | xargs)
+  claude -p "Say hello" --dangerously-skip-permissions --allowedTools ""
+'
+```
+
+### Interactive shell in container:
+```bash
+container run --rm -it --entrypoint /bin/bash nanoclaw-agent:latest
+```
+
+## SDK Options Reference
+
+The agent-runner uses these Claude Agent SDK options:
+
+```typescript
+query({
+  prompt: input.prompt,
+  options: {
+    cwd: '/workspace/group',
+    allowedTools: ['Bash', 'Read', 'Write', ...],
+    permissionMode: 'bypassPermissions',
+    allowDangerouslySkipPermissions: true,  // Required with bypassPermissions
+    settingSources: ['project'],
+    mcpServers: { ... }
+  }
+})
+```
+
+**Important:** `allowDangerouslySkipPermissions: true` is required when using `permissionMode: 'bypassPermissions'`. Without it, Claude Code exits with code 1.
+
+## Rebuilding After Changes
+
+```bash
+# Rebuild main app
+npm run build
+
+# Rebuild container (use --no-cache for clean rebuild)
+./container/build.sh
+
+# Or force full rebuild
+container builder prune -af
+./container/build.sh
+```
+
+## Checking Container Image
+
+```bash
+# List images
+container images
+
+# Check what's in the image
+container run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
+  echo "=== Node version ==="
+  node --version
+
+  echo "=== Claude Code version ==="
+  claude --version
+
+  echo "=== Installed packages ==="
+  ls /app/node_modules/
+'
+```
+
+## Session Persistence
+
+Claude sessions are stored in `~/.claude/` which is mounted into the container. To clear sessions:
+
+```bash
+# Clear all sessions
+rm -rf ~/.claude/projects/
+
+# Clear sessions for a specific group
+rm -rf ~/.claude/projects/*workspace-group*/
+```
+
+## IPC Debugging
+
+The container communicates back to the host via files in `/workspace/ipc/`:
+
+```bash
+# Check pending messages
+ls -la data/ipc/messages/
+
+# Check pending task operations
+ls -la data/ipc/tasks/
+
+# Read a specific IPC file
+cat data/ipc/messages/*.json
+```
+
+## Quick Diagnostic Script
+
+Run this to check common issues:
+
+```bash
+echo "=== Checking NanoClaw Container Setup ==="
+
+echo -e "\n1. API Key configured?"
+[ -f .env ] && grep -q "ANTHROPIC_API_KEY=sk-" .env && echo "OK" || echo "MISSING - create .env with ANTHROPIC_API_KEY"
+
+echo -e "\n2. Env file copied for container?"
+[ -f data/env/env ] && echo "OK" || echo "MISSING - will be created on first run"
+
+echo -e "\n3. Container image exists?"
+container images 2>/dev/null | grep -q nanoclaw-agent && echo "OK" || echo "MISSING - run ./container/build.sh"
+
+echo -e "\n4. Apple Container running?"
+container system info &>/dev/null && echo "OK" || echo "NOT RUNNING - run: container system start"
+
+echo -e "\n5. Groups directory?"
+ls -la groups/ 2>/dev/null || echo "MISSING - run setup"
+
+echo -e "\n6. Recent container logs?"
+ls -t groups/*/logs/container-*.log 2>/dev/null | head -3 || echo "No container logs yet"
+```
--- a/.claude/skills/setup/SKILL.md
+++ b/.claude/skills/setup/SKILL.md
@@ -37,7 +37,47 @@ container system start 2>/dev/null || true
 container --version
 ```

-## 3. Build Container Image
+## 3. Configure API Key
+
+Ask the user:
+> Do you have an Anthropic API key configured elsewhere that I should copy, or should I create a `.env` file for you to fill in?
+
+**If copying from another location:**
+```bash
+# Extract only the ANTHROPIC_API_KEY line from the source file
+grep "^ANTHROPIC_API_KEY=" /path/to/other/.env > .env
+```
+
+Verify the key exists (only show first/last few chars for security):
+```bash
+KEY=$(grep "^ANTHROPIC_API_KEY=" .env | cut -d= -f2)
+if [ -n "$KEY" ]; then
+  echo "API key configured: ${KEY:0:10}...${KEY: -4}"
+else
+  echo "API key missing or invalid"
+fi
+```
+
+**If creating new:**
+```bash
+echo 'ANTHROPIC_API_KEY=' > .env
+```
+
+Tell the user:
+> I've created `.env` in the project root. Please add your Anthropic API key after the `=` sign.
+> You can get an API key from https://console.anthropic.com/
+
+Wait for user confirmation, then verify (only show first/last few chars):
+```bash
+KEY=$(grep "^ANTHROPIC_API_KEY=" .env | cut -d= -f2)
+if [ -n "$KEY" ]; then
+  echo "API key configured: ${KEY:0:10}...${KEY: -4}"
+else
+  echo "API key missing or invalid"
+fi
+```
+
+## 4. Build Container Image

 Build the NanoClaw agent container:

@@ -45,15 +85,15 @@ Build the NanoClaw agent container:
 ./container/build.sh
 ```

-This creates the `nanoclaw-agent:latest` image with Node.js, Chromium, and agent-browser.
+This creates the `nanoclaw-agent:latest` image with Node.js, Chromium, Claude Code CLI, and agent-browser.

-Verify the image was created:
+Verify the build succeeded (the `container images` command may not work due to a plugin issue, so we verify by running a simple test):

 ```bash
-container images | grep nanoclaw-agent || echo "Image not found"
+echo '{}' | container run -i --entrypoint /bin/echo nanoclaw-agent:latest "Container OK" || echo "Container build failed"
 ```

-## 4. WhatsApp Authentication
+## 5. WhatsApp Authentication

 **USER ACTION REQUIRED**

@@ -73,7 +113,7 @@ Wait for the script to output "Successfully authenticated" then continue.

 If it says "Already authenticated", skip to the next step.

-## 5. Configure Assistant Name
+## 6. Configure Assistant Name

 Ask the user:
 > What trigger word do you want to use? (default: `Andy`)
@@ -82,7 +122,7 @@ Ask the user:

 Store their choice - you'll use it when creating the registered_groups.json and when telling them how to test.

-## 6. Register Main Channel
+## 7. Register Main Channel

 Ask the user:
 > Do you want to use your **personal chat** (message yourself) or a **WhatsApp group** as your main control channel?
@@ -126,7 +166,7 @@ Ensure the groups folder exists:
 mkdir -p groups/main/logs
 ```

-## 7. Gmail Authentication (Optional)
+## 8. Gmail Authentication (Optional)

 Ask the user:
 > Do you want to enable Gmail integration for reading/sending emails?
@@ -153,7 +193,7 @@ npx -y @gongrzhe/server-gmail-autoauth-mcp

 This will open a browser for OAuth consent. After authorization, credentials are cached.

-## 8. Configure launchd Service
+## 9. Configure launchd Service

 Get the actual paths:

@@ -212,7 +252,7 @@ Verify it's running:
 launchctl list | grep nanoclaw
 ```

-## 9. Test
+## 10. Test

 Tell the user (using the assistant name they configured):
 > Send `@ASSISTANT_NAME hello` in your registered chat.