# Design Document: Multi-CLI Backend ## Overview This design introduces a pluggable CLI backend system for the Aetheel gateway. The current architecture hardcodes Claude Code CLI invocation directly inside `AgentRuntime`. We will extract a `BackendAdapter` interface and provide four implementations (Claude, Codex, Gemini, OpenCode), each encapsulating CLI spawning, argument construction, output parsing, and session management. A `BackendRegistry` resolves the active backend from environment configuration at startup, validates it, and injects it into `AgentRuntime`. The key design goals are: - Zero behavioral change for existing Claude deployments (backward compatible defaults) - Each backend is a self-contained module with no cross-dependencies - The rest of the gateway (event processing, Discord integration, session management) remains untouched - Output is normalized into a single `EventResult` shape regardless of backend ## Architecture ```mermaid graph TD A[Discord Bot] --> B[EventQueue] B --> C[AgentRuntime] C --> D[BackendAdapter Interface] D --> E[ClaudeCodeBackend] D --> F[CodexBackend] D --> G[GeminiBackend] D --> H[OpenCodeBackend] I[BackendRegistry] -->|resolves active backend| D J[GatewayConfig] -->|AGENT_BACKEND env| I I -->|validates at startup| D ``` ### Startup Flow ```mermaid sequenceDiagram participant Main participant Config as GatewayConfig participant Registry as BackendRegistry participant Backend as BackendAdapter participant Runtime as AgentRuntime Main->>Config: loadConfig() Config-->>Main: config (includes agentBackend, backendCliPath) Main->>Registry: createBackend(config) Registry-->>Main: BackendAdapter instance Main->>Backend: validate() alt validation fails Main->>Main: log error, exit(1) end Main->>Runtime: new AgentRuntime(config, backend, ...) ``` ### Execution Flow ```mermaid sequenceDiagram participant Runtime as AgentRuntime participant Backend as BackendAdapter participant CLI as CLI Process Runtime->>Backend: execute(prompt, systemPrompt, sessionId?, onStream?) Backend->>CLI: spawn with backend-specific args CLI-->>Backend: stdout (JSON events) Backend->>Backend: parse output into EventResult Backend-->>Runtime: EventResult { responseText, sessionId, isError } ``` ## Components and Interfaces ### BackendAdapter Interface ```typescript export interface BackendAdapterConfig { cliPath: string; workingDir: string; queryTimeoutMs: number; allowedTools: string[]; maxTurns: number; model?: string; } export interface EventResult { responseText?: string; sessionId?: string; isError: boolean; } export type StreamCallback = (text: string) => Promise; export interface BackendAdapter { /** Unique identifier for this backend (e.g., "claude", "codex") */ name(): string; /** Execute a prompt and return normalized results */ execute( prompt: string, systemPrompt: string, sessionId?: string, onStream?: StreamCallback, ): Promise; /** Validate that the CLI binary is reachable and executable */ validate(): Promise; } ``` ### ClaudeCodeBackend Preserves the existing behavior extracted from `AgentRuntime.runClaude()`. - Writes system prompt to a temp file, passes via `--append-system-prompt-file` - Spawns: `claude -p --output-format json --dangerously-skip-permissions --append-system-prompt-file ` - Session resume: `--resume ` - Tool filtering: `--allowedTools ` for each tool - Max turns: `--max-turns ` - Parses JSON array output for `system/init` (session_id) and `result` objects ### CodexBackend - Spawns: `codex exec --json --dangerously-bypass-approvals-and-sandbox` - Working directory: `--cd ` - Session resume: `codex exec resume ` with follow-up prompt - Parses newline-delimited JSON events for the final assistant message - System prompt: passed via `--config system_prompt=` or prepended to prompt ### GeminiBackend - Spawns: `gemini --output-format json --approval-mode yolo` - Session resume: `--resume ` - Parses JSON output for response text - System prompt: prepended to prompt text (Gemini CLI has no system prompt file flag in non-interactive mode) ### OpenCodeBackend - Spawns: `opencode run --format json` - Session resume: `--session --continue` - Model selection: `--model ` - Parses JSON events for final response text - System prompt: prepended to prompt text ### BackendRegistry ```typescript export type BackendName = "claude" | "codex" | "gemini" | "opencode"; export function createBackend( name: BackendName, config: BackendAdapterConfig, ): BackendAdapter; export function resolveBackendName(raw: string | undefined): BackendName; ``` - `resolveBackendName` maps the `AGENT_BACKEND` env var to a valid `BackendName`, defaulting to `"claude"`, or throws with a descriptive error listing valid options - `createBackend` instantiates the correct implementation ### AgentRuntime Refactoring The constructor changes from: ```typescript constructor(config, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager) ``` to: ```typescript constructor(config, backend, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager) ``` - `executeClaude()` and `runClaude()` are replaced by `this.backend.execute()` - The `ClaudeJsonResponse` interface is removed from `AgentRuntime` - `EventResult` mapping: the backend's `EventResult` maps directly to the gateway's existing `EventResult` interface (adding `targetChannelId` in the runtime layer) ### GatewayConfig Changes ```typescript export interface GatewayConfig { // ... existing fields ... agentBackend: BackendName; // NEW: replaces implicit claude-only backendCliPath: string; // NEW: replaces claudeCliPath backendModel?: string; // NEW: optional model override backendMaxTurns: number; // NEW: configurable max turns // claudeCliPath removed } ``` New environment variables: - `AGENT_BACKEND` → `agentBackend` (default: `"claude"`) - `BACKEND_CLI_PATH` → `backendCliPath` (default: backend-specific, e.g., `"claude"`, `"codex"`, `"gemini"`, `"opencode"`) - `BACKEND_MODEL` → `backendModel` - `BACKEND_MAX_TURNS` → `backendMaxTurns` (default: `25`) ## Data Models ### EventResult (Backend) ```typescript export interface BackendEventResult { responseText?: string; sessionId?: string; isError: boolean; } ``` This is the normalized output from any backend. The `AgentRuntime` maps it to the gateway's `EventResult`: ```typescript // Gateway EventResult (existing, unchanged) export interface EventResult { responseText?: string; targetChannelId?: string; sessionId?: string; error?: string; } ``` Mapping logic: ```typescript if (backendResult.isError) { return { error: backendResult.responseText, targetChannelId }; } else { return { responseText: backendResult.responseText, targetChannelId, sessionId: backendResult.sessionId }; } ``` ### BackendAdapterConfig ```typescript export interface BackendAdapterConfig { cliPath: string; // Path to CLI binary workingDir: string; // Working directory for CLI process queryTimeoutMs: number; // Timeout before killing the process allowedTools: string[]; // Tools to whitelist (backend-specific support) maxTurns: number; // Max agentic turns model?: string; // Optional model override } ``` ### CLI Output Formats | Backend | Output Format | Session ID Source | Result Source | |-----------|------------------------------|--------------------------------------|-----------------------------------| | Claude | JSON array | `system/init` object `.session_id` | `result` object `.result` | | Codex | Newline-delimited JSON | Session ID from exec metadata | Final assistant message content | | Gemini | JSON object | Session metadata in output | Response text field | | OpenCode | JSON events | Session field in response | Final response text | ## Correctness Properties *A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.* ### Property 1: Claude backend required flags *For any* prompt string, system prompt string, and allowed tools list, the Claude backend's generated argument list SHALL always contain `-p`, `--output-format json`, `--dangerously-skip-permissions`, `--append-system-prompt-file`, `--max-turns`, and one `--allowedTools` entry per configured tool. **Validates: Requirements 2.2, 2.5, 2.6** ### Property 2: Codex backend required flags *For any* prompt string and working directory, the Codex backend's generated argument list SHALL always contain the `exec` subcommand, `--json`, `--dangerously-bypass-approvals-and-sandbox`, and `--cd `. **Validates: Requirements 3.2, 3.3, 3.4, 3.5** ### Property 3: Gemini backend required flags *For any* prompt string, the Gemini backend's generated argument list SHALL always contain the prompt as a positional argument, `--output-format json`, and `--approval-mode yolo`. **Validates: Requirements 4.2, 4.3, 4.4** ### Property 4: OpenCode backend required flags *For any* prompt string and optional model string, the OpenCode backend's generated argument list SHALL always contain the `run` subcommand, `--format json`, and when a model is configured, `--model `. **Validates: Requirements 5.2, 5.3, 5.5** ### Property 5: Session resume args across backends *For any* backend and any non-empty session ID string, the generated argument list SHALL include the backend-specific session resume flags: `--resume ` for Claude, `resume ` subcommand for Codex, `--resume ` for Gemini, and `--session --continue` for OpenCode. When no session ID is provided, no session-related flags SHALL appear. **Validates: Requirements 2.3, 3.7, 4.5, 5.4** ### Property 6: Output parsing extracts correct fields *For any* valid backend-specific JSON output containing a response text and session ID, the backend's parser SHALL produce a `BackendEventResult` where `responseText` matches the expected response content and `sessionId` matches the expected session identifier. **Validates: Requirements 2.4, 3.6, 4.6, 5.6, 8.1** ### Property 7: Backend name resolution *For any* string, `resolveBackendName` SHALL return the corresponding `BackendName` if the string is one of `"claude"`, `"codex"`, `"gemini"`, or `"opencode"`, SHALL return `"claude"` when the input is `undefined`, and SHALL throw a descriptive error for any other string value. **Validates: Requirements 6.1, 6.2, 6.3, 6.5** ### Property 8: Non-zero exit code produces error result *For any* backend, any non-zero exit code, and any stderr string, the backend SHALL return a `BackendEventResult` with `isError` set to `true` and `responseText` containing the stderr content. **Validates: Requirements 8.2** ### Property 9: EventResult mapping preserves semantics *For any* `BackendEventResult` and target channel ID, the mapping to the gateway's `EventResult` SHALL set `error` to `responseText` when `isError` is true (with no `responseText` on the gateway result), and SHALL set `responseText` and `sessionId` when `isError` is false (with no `error` on the gateway result). `targetChannelId` SHALL always be set. **Validates: Requirements 10.3** ### Property 10: Session ID storage after backend execution *For any* channel ID and any `BackendEventResult` containing a non-undefined `sessionId`, after the `AgentRuntime` processes the result, the `SessionManager` SHALL contain that session ID for that channel. When `sessionId` is undefined, the session manager SHALL not be updated for that channel. **Validates: Requirements 10.4** ## Error Handling ### CLI Process Errors | Error Condition | Handling | |---|---| | CLI binary not found | `validate()` returns false at startup → gateway logs error with backend name and path, exits with code 1 | | Non-zero exit code | Backend sets `isError: true`, includes stderr (truncated to 500 chars) in `responseText` | | Query timeout | Backend kills process with SIGTERM after `queryTimeoutMs`, returns `{ isError: true, responseText: "Query timed out" }` | | Invalid JSON output | Backend returns `{ isError: true, responseText: "Failed to parse CLI output" }` | | Session corruption | `AgentRuntime` detects session-related error messages, removes session from `SessionManager`, allows retry without session | ### Configuration Errors | Error Condition | Handling | |---|---| | Invalid `AGENT_BACKEND` value | `resolveBackendName` throws with message listing valid options; gateway fails at startup | | Invalid `BACKEND_MAX_TURNS` | Falls back to default (25), logs warning | | Unsupported option for backend | Logs warning, ignores the option (e.g., `ALLOWED_TOOLS` for backends that don't support tool filtering) | ### Retry Strategy The existing `withRetry` mechanism in `AgentRuntime` continues to wrap backend execution calls: - Max 3 retries with exponential backoff (5s base) - Transient errors (timeout, spawn failure, crash) trigger retry - Session corruption errors are non-retryable; session is cleared and the next attempt starts fresh ## Testing Strategy ### Property-Based Testing Library: [fast-check](https://github.com/dubzzz/fast-check) for TypeScript property-based testing. Each property test runs a minimum of 100 iterations. Each test is tagged with a comment referencing the design property: ```typescript // Feature: multi-cli-backend, Property 1: Claude backend required flags ``` Properties to implement: - **Property 1–4**: Generate random prompt strings, system prompts, tool lists, and config values. Call each backend's arg-building function and assert required flags are present. - **Property 5**: Generate random session ID strings (including empty/undefined). For each backend, verify session flags appear only when a session ID is provided. - **Property 6**: Generate random valid JSON output structures per backend format. Parse and verify extracted fields match. - **Property 7**: Generate random strings. Verify resolution behavior (valid → correct BackendName, undefined → "claude", invalid → throws). - **Property 8**: Generate random exit codes (non-zero) and stderr strings. Verify error result shape. - **Property 9**: Generate random `BackendEventResult` objects. Verify mapping to gateway `EventResult`. - **Property 10**: Generate random channel IDs and `BackendEventResult` objects with/without session IDs. Verify session manager state. ### Unit Testing Unit tests complement property tests for specific examples and edge cases: - Each backend's `validate()` method with mocked filesystem - Timeout behavior with a mock slow process - Startup flow: valid config → backend created → validated → injected into runtime - Startup flow: invalid backend name → descriptive error - Default config values when env vars are unset - Streaming callback invocation during output parsing - Session corruption detection and cleanup ### Integration Testing - End-to-end test with a mock CLI script that echoes JSON in each backend's format - Verify the full flow: config → registry → backend → execute → parse → EventResult