feat: config-driven architecture, install wizard, live runtime switching, usage tracking, auto-failover

Major changes: - Config-driven adapters: all channels (Slack, Discord, Telegram, WebChat, Webhooks) controlled via config.json with enabled flags and token auto-detection, no CLI flags required - Runtime engine field: runtime.engine selects opencode/claude from config - Interactive install script: 8-phase setup wizard with AI runtime detection/installation, token setup, identity file personalization (personality presets), aetheel CLI command, background service (launchd/systemd) - Live runtime switching: /engine, /model, /provider commands hot-swap the AI runtime from chat without restart, changes persisted to config.json - Usage tracking: per-request cost extraction from Claude Code JSON output, cumulative stats via /usage command - Auto-failover: rate limit detection on both runtimes, automatic switch to other engine on quota errors with user notification - Chat commands work without / prefix (Slack intercepts / in channels), commands: engine, model, provider, config, usage, reload, cron, subagents, status, help - /config set for editing config.json from chat with dotted key notation - Security audit saved to docs/security-audit.md - Full command reference in docs/commands.md - Future changes doc with NanoClaw agent teams analysis - Logo added to README and WebChat UI - README fully rewritten with all features documented
2026-02-18 01:07:12 -05:00
parent 41b2f9a593
commit 6d73f74e0b
41 changed files with 11363 additions and 437 deletions
--- a/docs/security-audit.md
+++ b/docs/security-audit.md
@@ -0,0 +1,172 @@
+# Aetheel Security Audit
+
+**Date:** February 17, 2026
+**Scope:** Full codebase review of all modules
+
+---
+
+## CRITICAL
+
+### 1. Path Traversal in `memory/manager.py` → `read_file()`
+
+The method accepts absolute paths and resolves them with `os.path.realpath()` but never validates the result is within the workspace directory. An attacker (or the AI itself) could read arbitrary files:
+
+```python
+# Current code — no containment check
+if os.path.isabs(raw):
+    abs_path = os.path.realpath(raw)
+```
+
+**Fix:** Add a check like `if not abs_path.startswith(self._workspace_dir): raise ValueError("path outside workspace")`
+
+### 2. Arbitrary Code Execution via Hook `handler.py` Loading
+
+`hooks/hooks.py` → `_load_handler` uses `importlib.util.spec_from_file_location` to dynamically load and execute arbitrary Python from `handler.py` files found in the workspace. If an attacker can write a file to `~/.aetheel/workspace/hooks/<name>/handler.py`, they get full code execution. There's no sandboxing, signature verification, or allowlisting.
+
+### 3. Webhook Auth Defaults to Open Access
+
+`webhooks/receiver.py` → `_check_auth`:
+
+```python
+if not self._config.token:
+    return True  # No token configured = open access
+```
+
+If the webhook receiver is enabled without a token, anyone on the network can trigger AI actions. The default config writes `"token": ""` which means open access.
+
+### 4. AI-Controlled Action Tags Execute Without Validation
+
+`main.py` → `_process_action_tags` parses the AI's response text for action tags like `[ACTION:cron|...]`, `[ACTION:spawn|...]`, and `[ACTION:remind|...]`. The AI can:
+
+- Schedule arbitrary cron jobs with any expression
+- Spawn unlimited subagent tasks
+- Set reminders with any delay
+
+There's no validation that the AI was asked to do this, no user confirmation, and no rate limiting. A prompt injection attack via any adapter could trigger these.
+
+---
+
+## HIGH
+
+### 5. No Input Validation on Webhook POST Bodies
+
+`webhooks/receiver.py` — JSON payloads are parsed but never schema-validated. Fields like `channel_id`, `sender`, `channel` are passed through directly. The `body` dict is stored in `raw_event` and could contain arbitrarily large data.
+
+### 6. No Request Size Limits on HTTP Endpoints
+
+Neither the webhook receiver nor the WebChat adapter set `client_max_size` on the aiohttp `Application`. Default is 2MB but there's no explicit limit, and no per-request timeout.
+
+### 7. WebSocket Has No Authentication
+
+`adapters/webchat_adapter.py` — Anyone who can reach the WebSocket endpoint at `/ws` can interact with the AI. No token, no session cookie, no origin check. If the host is changed from `127.0.0.1` to `0.0.0.0`, this becomes remotely exploitable.
+
+### 8. No Rate Limiting Anywhere
+
+No rate limiting on:
+
+- Webhook endpoints
+- WebSocket messages
+- Adapter message handlers
+- Subagent spawning (only a concurrent limit of 3, but no cooldown)
+- Scheduler job creation
+
+### 9. Cron Expression Not Validated Before APScheduler
+
+`scheduler/scheduler.py` → `_register_cron_job` only checks `len(parts) != 5`. Malformed values within fields (e.g., `999 999 999 999 999`) are passed directly to `CronTrigger`, which could cause unexpected behavior or exceptions.
+
+### 10. Webhook Token in Query Parameter
+
+`webhooks/receiver.py`:
+
+```python
+if request.query.get("token") == self._config.token:
+    return True
+```
+
+Query parameters are logged in web server access logs, browser history, and proxy logs. This leaks the auth token.
+
+---
+
+## MEDIUM
+
+### 11. SQLite Databases Created with Default Permissions
+
+`sessions.db`, `scheduler.db`, and `memory.db` are all created under `~/.aetheel/` with default umask permissions. On multi-user systems, these could be world-readable.
+
+### 12. Webhook Token Stored in `config.json`
+
+The `webhooks.token` field in `config.py` is read from and written to `config.json`, which is a plaintext file. Secrets should only live in `.env`.
+
+### 13. No HTTPS on Any HTTP Endpoint
+
+Both WebChat (port 8080) and webhooks (port 8090) run plain HTTP. Even on localhost, this is vulnerable to local network sniffing.
+
+### 14. Full Environment Passed to Subprocesses
+
+`_build_cli_env()` in both runtimes copies `os.environ` entirely to the subprocess, which may include sensitive variables beyond what the CLI needs.
+
+### 15. Session Logs Contain Full Conversations in Plaintext
+
+`memory/manager.py` → `log_session()` writes unencrypted markdown files to `~/.aetheel/workspace/daily/`. No access control, no encryption, no retention policy.
+
+### 16. XSS Partially Mitigated in `chat.html` but Fragile
+
+The `renderMarkdown()` function escapes `<`, `>`, `&` first, then applies regex-based markdown rendering. User messages use `textContent` (safe). AI messages use `innerHTML` with the escaped+rendered output. The escaping happens before markdown processing, which is the right order, but the regex-based approach is fragile — edge cases in the markdown regexes could potentially bypass the escaping.
+
+### 17. No CORS Headers on WebChat
+
+The aiohttp app doesn't configure CORS. If exposed beyond localhost, cross-origin requests could interact with the WebSocket.
+
+---
+
+## LOW
+
+### 18. Loose Dependency Version Constraints
+
+`pyproject.toml`:
+
+- `python-telegram-bot>=21.0` — no upper bound
+- `discord.py>=2.4.0` — no upper bound
+- `fastembed>=0.7.4` — no upper bound
+
+These could pull in breaking or vulnerable versions on fresh installs.
+
+### 19. No Security Scanning in CI/Test Pipeline
+
+No `bandit`, `safety`, `pip-audit`, or similar tools in the test suite or project config.
+
+### 20. `config edit` Uses `$EDITOR` Without Sanitization
+
+`cli.py`:
+
+```python
+editor = os.environ.get("EDITOR", "nano")
+subprocess.run([editor, CONFIG_PATH], check=True)
+```
+
+If `$EDITOR` contains spaces or special characters, this could behave unexpectedly (though `subprocess.run` with a list is safe from shell injection).
+
+### 21. No Data Retention/Cleanup for Session Logs
+
+Session logs accumulate indefinitely in `daily/`. No automatic pruning.
+
+### 22. `SubagentBus` Has No Authentication
+
+The pub/sub bus allows any code in the process to publish/subscribe to any channel. No isolation between subagents.
+
+---
+
+## Recommended Priority Fixes
+
+The most impactful changes to make first:
+
+1. **Add path containment check in `read_file()`** — one-line fix, prevents file system escape
+2. **Make webhook auth mandatory** when `webhooks.enabled = true` — refuse to start without a token
+3. **Add input schema validation** on webhook POST bodies
+4. **Validate cron expressions** more strictly before passing to APScheduler
+5. **Add rate limiting** to webhook and WebSocket endpoints (e.g., aiohttp middleware)
+6. **Move `webhooks.token` to `.env` only**, remove from `config.json`
+7. **Add WebSocket origin checking or token auth** to WebChat
+8. **Set explicit `client_max_size`** on aiohttp apps
+9. **Pin dependency upper bounds** in `pyproject.toml`
+10. **Add `bandit`** to the test pipeline