feat: add pluggable multi-CLI backend system

Implement BackendAdapter interface with four CLI backends: - ClaudeCodeBackend (extracted from AgentRuntime) - CodexBackend (OpenAI Codex CLI) - GeminiBackend (Google Gemini CLI) - OpenCodeBackend (OpenCode CLI) Add BackendRegistry for resolution/creation via AGENT_BACKEND env var. Refactor AgentRuntime to delegate to BackendAdapter instead of hardcoding Claude CLI. Update GatewayConfig with new env vars (AGENT_BACKEND, BACKEND_CLI_PATH, BACKEND_MODEL, BACKEND_MAX_TURNS). Includes 10 property-based test files and unit tests for edge cases.
2026-02-22 23:41:30 -05:00
parent f2247ea3ac
commit 453389f55c
25 changed files with 3262 additions and 195 deletions
--- a/.kiro/specs/multi-cli-backend/.config.kiro
+++ b/.kiro/specs/multi-cli-backend/.config.kiro
@@ -0,0 +1 @@
+{"specId": "66d67457-3ea3-493c-9d0f-b868b51d309d", "workflowType": "requirements-first", "specType": "feature"}
--- a/.kiro/specs/multi-cli-backend/design.md
+++ b/.kiro/specs/multi-cli-backend/design.md
@@ -0,0 +1,370 @@
+# Design Document: Multi-CLI Backend
+
+## Overview
+
+This design introduces a pluggable CLI backend system for the Aetheel gateway. The current architecture hardcodes Claude Code CLI invocation directly inside `AgentRuntime`. We will extract a `BackendAdapter` interface and provide four implementations (Claude, Codex, Gemini, OpenCode), each encapsulating CLI spawning, argument construction, output parsing, and session management. A `BackendRegistry` resolves the active backend from environment configuration at startup, validates it, and injects it into `AgentRuntime`.
+
+The key design goals are:
+- Zero behavioral change for existing Claude deployments (backward compatible defaults)
+- Each backend is a self-contained module with no cross-dependencies
+- The rest of the gateway (event processing, Discord integration, session management) remains untouched
+- Output is normalized into a single `EventResult` shape regardless of backend
+
+## Architecture
+
+```mermaid
+graph TD
+    A[Discord Bot] --> B[EventQueue]
+    B --> C[AgentRuntime]
+    C --> D[BackendAdapter Interface]
+    D --> E[ClaudeCodeBackend]
+    D --> F[CodexBackend]
+    D --> G[GeminiBackend]
+    D --> H[OpenCodeBackend]
+    I[BackendRegistry] -->|resolves active backend| D
+    J[GatewayConfig] -->|AGENT_BACKEND env| I
+    I -->|validates at startup| D
+```
+
+### Startup Flow
+
+```mermaid
+sequenceDiagram
+    participant Main
+    participant Config as GatewayConfig
+    participant Registry as BackendRegistry
+    participant Backend as BackendAdapter
+    participant Runtime as AgentRuntime
+
+    Main->>Config: loadConfig()
+    Config-->>Main: config (includes agentBackend, backendCliPath)
+    Main->>Registry: createBackend(config)
+    Registry-->>Main: BackendAdapter instance
+    Main->>Backend: validate()
+    alt validation fails
+        Main->>Main: log error, exit(1)
+    end
+    Main->>Runtime: new AgentRuntime(config, backend, ...)
+```
+
+### Execution Flow
+
+```mermaid
+sequenceDiagram
+    participant Runtime as AgentRuntime
+    participant Backend as BackendAdapter
+    participant CLI as CLI Process
+
+    Runtime->>Backend: execute(prompt, systemPrompt, sessionId?, onStream?)
+    Backend->>CLI: spawn with backend-specific args
+    CLI-->>Backend: stdout (JSON events)
+    Backend->>Backend: parse output into EventResult
+    Backend-->>Runtime: EventResult { responseText, sessionId, isError }
+```
+
+## Components and Interfaces
+
+### BackendAdapter Interface
+
+```typescript
+export interface BackendAdapterConfig {
+  cliPath: string;
+  workingDir: string;
+  queryTimeoutMs: number;
+  allowedTools: string[];
+  maxTurns: number;
+  model?: string;
+}
+
+export interface EventResult {
+  responseText?: string;
+  sessionId?: string;
+  isError: boolean;
+}
+
+export type StreamCallback = (text: string) => Promise<void>;
+
+export interface BackendAdapter {
+  /** Unique identifier for this backend (e.g., "claude", "codex") */
+  name(): string;
+
+  /** Execute a prompt and return normalized results */
+  execute(
+    prompt: string,
+    systemPrompt: string,
+    sessionId?: string,
+    onStream?: StreamCallback,
+  ): Promise<EventResult>;
+
+  /** Validate that the CLI binary is reachable and executable */
+  validate(): Promise<boolean>;
+}
+```
+
+### ClaudeCodeBackend
+
+Preserves the existing behavior extracted from `AgentRuntime.runClaude()`.
+
+- Writes system prompt to a temp file, passes via `--append-system-prompt-file`
+- Spawns: `claude -p <prompt> --output-format json --dangerously-skip-permissions --append-system-prompt-file <file>`
+- Session resume: `--resume <sessionId>`
+- Tool filtering: `--allowedTools <tool>` for each tool
+- Max turns: `--max-turns <n>`
+- Parses JSON array output for `system/init` (session_id) and `result` objects
+
+### CodexBackend
+
+- Spawns: `codex exec <prompt> --json --dangerously-bypass-approvals-and-sandbox`
+- Working directory: `--cd <path>`
+- Session resume: `codex exec resume <sessionId>` with follow-up prompt
+- Parses newline-delimited JSON events for the final assistant message
+- System prompt: passed via `--config system_prompt=<text>` or prepended to prompt
+
+### GeminiBackend
+
+- Spawns: `gemini <prompt> --output-format json --approval-mode yolo`
+- Session resume: `--resume <sessionId>`
+- Parses JSON output for response text
+- System prompt: prepended to prompt text (Gemini CLI has no system prompt file flag in non-interactive mode)
+
+### OpenCodeBackend
+
+- Spawns: `opencode run <prompt> --format json`
+- Session resume: `--session <sessionId> --continue`
+- Model selection: `--model <provider/model>`
+- Parses JSON events for final response text
+- System prompt: prepended to prompt text
+
+### BackendRegistry
+
+```typescript
+export type BackendName = "claude" | "codex" | "gemini" | "opencode";
+
+export function createBackend(
+  name: BackendName,
+  config: BackendAdapterConfig,
+): BackendAdapter;
+
+export function resolveBackendName(raw: string | undefined): BackendName;
+```
+
+- `resolveBackendName` maps the `AGENT_BACKEND` env var to a valid `BackendName`, defaulting to `"claude"`, or throws with a descriptive error listing valid options
+- `createBackend` instantiates the correct implementation
+
+### AgentRuntime Refactoring
+
+The constructor changes from:
+```typescript
+constructor(config, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
+```
+to:
+```typescript
+constructor(config, backend, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
+```
+
+- `executeClaude()` and `runClaude()` are replaced by `this.backend.execute()`
+- The `ClaudeJsonResponse` interface is removed from `AgentRuntime`
+- `EventResult` mapping: the backend's `EventResult` maps directly to the gateway's existing `EventResult` interface (adding `targetChannelId` in the runtime layer)
+
+### GatewayConfig Changes
+
+```typescript
+export interface GatewayConfig {
+  // ... existing fields ...
+  agentBackend: BackendName;      // NEW: replaces implicit claude-only
+  backendCliPath: string;          // NEW: replaces claudeCliPath
+  backendModel?: string;           // NEW: optional model override
+  backendMaxTurns: number;         // NEW: configurable max turns
+  // claudeCliPath removed
+}
+```
+
+New environment variables:
+- `AGENT_BACKEND` → `agentBackend` (default: `"claude"`)
+- `BACKEND_CLI_PATH` → `backendCliPath` (default: backend-specific, e.g., `"claude"`, `"codex"`, `"gemini"`, `"opencode"`)
+- `BACKEND_MODEL` → `backendModel`
+- `BACKEND_MAX_TURNS` → `backendMaxTurns` (default: `25`)
+
+## Data Models
+
+### EventResult (Backend)
+
+```typescript
+export interface BackendEventResult {
+  responseText?: string;
+  sessionId?: string;
+  isError: boolean;
+}
+```
+
+This is the normalized output from any backend. The `AgentRuntime` maps it to the gateway's `EventResult`:
+
+```typescript
+// Gateway EventResult (existing, unchanged)
+export interface EventResult {
+  responseText?: string;
+  targetChannelId?: string;
+  sessionId?: string;
+  error?: string;
+}
+```
+
+Mapping logic:
+```typescript
+if (backendResult.isError) {
+  return { error: backendResult.responseText, targetChannelId };
+} else {
+  return { responseText: backendResult.responseText, targetChannelId, sessionId: backendResult.sessionId };
+}
+```
+
+### BackendAdapterConfig
+
+```typescript
+export interface BackendAdapterConfig {
+  cliPath: string;          // Path to CLI binary
+  workingDir: string;       // Working directory for CLI process
+  queryTimeoutMs: number;   // Timeout before killing the process
+  allowedTools: string[];   // Tools to whitelist (backend-specific support)
+  maxTurns: number;         // Max agentic turns
+  model?: string;           // Optional model override
+}
+```
+
+### CLI Output Formats
+
+| Backend   | Output Format                | Session ID Source                    | Result Source                     |
+|-----------|------------------------------|--------------------------------------|-----------------------------------|
+| Claude    | JSON array                   | `system/init` object `.session_id`   | `result` object `.result`         |
+| Codex     | Newline-delimited JSON       | Session ID from exec metadata        | Final assistant message content   |
+| Gemini    | JSON object                  | Session metadata in output           | Response text field               |
+| OpenCode  | JSON events                  | Session field in response            | Final response text               |
+
+
+## Correctness Properties
+
+*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
+
+### Property 1: Claude backend required flags
+
+*For any* prompt string, system prompt string, and allowed tools list, the Claude backend's generated argument list SHALL always contain `-p`, `--output-format json`, `--dangerously-skip-permissions`, `--append-system-prompt-file`, `--max-turns`, and one `--allowedTools` entry per configured tool.
+
+**Validates: Requirements 2.2, 2.5, 2.6**
+
+### Property 2: Codex backend required flags
+
+*For any* prompt string and working directory, the Codex backend's generated argument list SHALL always contain the `exec` subcommand, `--json`, `--dangerously-bypass-approvals-and-sandbox`, and `--cd <workingDir>`.
+
+**Validates: Requirements 3.2, 3.3, 3.4, 3.5**
+
+### Property 3: Gemini backend required flags
+
+*For any* prompt string, the Gemini backend's generated argument list SHALL always contain the prompt as a positional argument, `--output-format json`, and `--approval-mode yolo`.
+
+**Validates: Requirements 4.2, 4.3, 4.4**
+
+### Property 4: OpenCode backend required flags
+
+*For any* prompt string and optional model string, the OpenCode backend's generated argument list SHALL always contain the `run` subcommand, `--format json`, and when a model is configured, `--model <model>`.
+
+**Validates: Requirements 5.2, 5.3, 5.5**
+
+### Property 5: Session resume args across backends
+
+*For any* backend and any non-empty session ID string, the generated argument list SHALL include the backend-specific session resume flags: `--resume <id>` for Claude, `resume <id>` subcommand for Codex, `--resume <id>` for Gemini, and `--session <id> --continue` for OpenCode. When no session ID is provided, no session-related flags SHALL appear.
+
+**Validates: Requirements 2.3, 3.7, 4.5, 5.4**
+
+### Property 6: Output parsing extracts correct fields
+
+*For any* valid backend-specific JSON output containing a response text and session ID, the backend's parser SHALL produce a `BackendEventResult` where `responseText` matches the expected response content and `sessionId` matches the expected session identifier.
+
+**Validates: Requirements 2.4, 3.6, 4.6, 5.6, 8.1**
+
+### Property 7: Backend name resolution
+
+*For any* string, `resolveBackendName` SHALL return the corresponding `BackendName` if the string is one of `"claude"`, `"codex"`, `"gemini"`, or `"opencode"`, SHALL return `"claude"` when the input is `undefined`, and SHALL throw a descriptive error for any other string value.
+
+**Validates: Requirements 6.1, 6.2, 6.3, 6.5**
+
+### Property 8: Non-zero exit code produces error result
+
+*For any* backend, any non-zero exit code, and any stderr string, the backend SHALL return a `BackendEventResult` with `isError` set to `true` and `responseText` containing the stderr content.
+
+**Validates: Requirements 8.2**
+
+### Property 9: EventResult mapping preserves semantics
+
+*For any* `BackendEventResult` and target channel ID, the mapping to the gateway's `EventResult` SHALL set `error` to `responseText` when `isError` is true (with no `responseText` on the gateway result), and SHALL set `responseText` and `sessionId` when `isError` is false (with no `error` on the gateway result). `targetChannelId` SHALL always be set.
+
+**Validates: Requirements 10.3**
+
+### Property 10: Session ID storage after backend execution
+
+*For any* channel ID and any `BackendEventResult` containing a non-undefined `sessionId`, after the `AgentRuntime` processes the result, the `SessionManager` SHALL contain that session ID for that channel. When `sessionId` is undefined, the session manager SHALL not be updated for that channel.
+
+**Validates: Requirements 10.4**
+
+## Error Handling
+
+### CLI Process Errors
+
+| Error Condition | Handling |
+|---|---|
+| CLI binary not found | `validate()` returns false at startup → gateway logs error with backend name and path, exits with code 1 |
+| Non-zero exit code | Backend sets `isError: true`, includes stderr (truncated to 500 chars) in `responseText` |
+| Query timeout | Backend kills process with SIGTERM after `queryTimeoutMs`, returns `{ isError: true, responseText: "Query timed out" }` |
+| Invalid JSON output | Backend returns `{ isError: true, responseText: "Failed to parse CLI output" }` |
+| Session corruption | `AgentRuntime` detects session-related error messages, removes session from `SessionManager`, allows retry without session |
+
+### Configuration Errors
+
+| Error Condition | Handling |
+|---|---|
+| Invalid `AGENT_BACKEND` value | `resolveBackendName` throws with message listing valid options; gateway fails at startup |
+| Invalid `BACKEND_MAX_TURNS` | Falls back to default (25), logs warning |
+| Unsupported option for backend | Logs warning, ignores the option (e.g., `ALLOWED_TOOLS` for backends that don't support tool filtering) |
+
+### Retry Strategy
+
+The existing `withRetry` mechanism in `AgentRuntime` continues to wrap backend execution calls:
+- Max 3 retries with exponential backoff (5s base)
+- Transient errors (timeout, spawn failure, crash) trigger retry
+- Session corruption errors are non-retryable; session is cleared and the next attempt starts fresh
+
+## Testing Strategy
+
+### Property-Based Testing
+
+Library: [fast-check](https://github.com/dubzzz/fast-check) for TypeScript property-based testing.
+
+Each property test runs a minimum of 100 iterations. Each test is tagged with a comment referencing the design property:
+
+```typescript
+// Feature: multi-cli-backend, Property 1: Claude backend required flags
+```
+
+Properties to implement:
+- **Property 1–4**: Generate random prompt strings, system prompts, tool lists, and config values. Call each backend's arg-building function and assert required flags are present.
+- **Property 5**: Generate random session ID strings (including empty/undefined). For each backend, verify session flags appear only when a session ID is provided.
+- **Property 6**: Generate random valid JSON output structures per backend format. Parse and verify extracted fields match.
+- **Property 7**: Generate random strings. Verify resolution behavior (valid → correct BackendName, undefined → "claude", invalid → throws).
+- **Property 8**: Generate random exit codes (non-zero) and stderr strings. Verify error result shape.
+- **Property 9**: Generate random `BackendEventResult` objects. Verify mapping to gateway `EventResult`.
+- **Property 10**: Generate random channel IDs and `BackendEventResult` objects with/without session IDs. Verify session manager state.
+
+### Unit Testing
+
+Unit tests complement property tests for specific examples and edge cases:
+- Each backend's `validate()` method with mocked filesystem
+- Timeout behavior with a mock slow process
+- Startup flow: valid config → backend created → validated → injected into runtime
+- Startup flow: invalid backend name → descriptive error
+- Default config values when env vars are unset
+- Streaming callback invocation during output parsing
+- Session corruption detection and cleanup
+
+### Integration Testing
+
+- End-to-end test with a mock CLI script that echoes JSON in each backend's format
+- Verify the full flow: config → registry → backend → execute → parse → EventResult
--- a/.kiro/specs/multi-cli-backend/requirements.md
+++ b/.kiro/specs/multi-cli-backend/requirements.md
@@ -0,0 +1,136 @@
+# Requirements Document
+
+## Introduction
+
+The gateway currently hardcodes Claude Code CLI as its sole agent backend. This feature introduces a pluggable CLI backend system that allows operators to choose between Claude Code CLI, OpenCode CLI, Codex CLI, and Gemini CLI. Each backend has different command-line interfaces, output formats, and session management semantics. The system must abstract these differences behind a unified interface so the rest of the gateway (event processing, session management, Discord integration) remains unchanged.
+
+## Glossary
+
+- **Gateway**: The Discord-to-agent bridge application (Aetheel) that receives prompts and dispatches them to a CLI backend
+- **CLI_Backend**: A pluggable module that knows how to spawn a specific CLI tool, pass prompts and system prompts, parse output, and manage sessions
+- **Backend_Registry**: The component that holds all available CLI_Backend implementations and resolves the active one from configuration
+- **Agent_Runtime**: The existing `AgentRuntime` class that orchestrates event processing; it will delegate CLI execution to the active CLI_Backend
+- **Backend_Adapter**: An interface that each CLI_Backend must implement, defining spawn, parse, and session operations
+- **Session_ID**: An opaque string returned by a CLI backend that allows resuming a prior conversation
+- **Event_Result**: The normalized response object returned by any CLI_Backend after processing a prompt
+
+## Requirements
+
+### Requirement 1: Backend Adapter Interface
+
+**User Story:** As a developer, I want a common interface for all CLI backends, so that the gateway can interact with any backend without knowing its implementation details.
+
+#### Acceptance Criteria
+
+1. THE Backend_Adapter SHALL define a method to execute a prompt given a prompt string, a system prompt string, an optional Session_ID, and an optional streaming callback
+2. THE Backend_Adapter SHALL return an Event_Result containing the response text, an optional Session_ID for continuation, and an error flag
+3. THE Backend_Adapter SHALL define a method to return the backend name as a string identifier
+4. THE Backend_Adapter SHALL define a method to validate that the CLI tool is reachable on the system (e.g., binary exists at configured path)
+
+### Requirement 2: Claude Code CLI Backend
+
+**User Story:** As an operator, I want the existing Claude Code CLI integration preserved as a backend, so that current deployments continue working without changes.
+
+#### Acceptance Criteria
+
+1. THE Claude_Code_Backend SHALL implement the Backend_Adapter interface
+2. THE Claude_Code_Backend SHALL spawn the Claude CLI with `-p`, `--output-format json`, `--dangerously-skip-permissions`, and `--append-system-prompt-file` flags
+3. WHEN a Session_ID is provided, THE Claude_Code_Backend SHALL pass `--resume <Session_ID>` to the CLI process
+4. THE Claude_Code_Backend SHALL parse the JSON array output to extract `session_id` from `system/init` objects and `result` from `result` objects
+5. THE Claude_Code_Backend SHALL pass `--allowedTools` flags for each tool in the configured allowed tools list
+6. THE Claude_Code_Backend SHALL pass `--max-turns 25` to the CLI process
+
+### Requirement 3: Codex CLI Backend
+
+**User Story:** As an operator, I want to use OpenAI Codex CLI as a backend, so that I can leverage OpenAI models through the gateway.
+
+#### Acceptance Criteria
+
+1. THE Codex_Backend SHALL implement the Backend_Adapter interface
+2. THE Codex_Backend SHALL spawn the Codex CLI using `codex exec` subcommand for non-interactive execution
+3. THE Codex_Backend SHALL pass `--json` to receive newline-delimited JSON output
+4. THE Codex_Backend SHALL pass `--dangerously-bypass-approvals-and-sandbox` to skip approval prompts
+5. WHEN a working directory is configured, THE Codex_Backend SHALL pass `--cd <path>` to set the workspace root
+6. THE Codex_Backend SHALL parse the newline-delimited JSON events to extract the final assistant message as the response text
+7. WHEN a Session_ID is provided, THE Codex_Backend SHALL use `codex exec resume <Session_ID>` to continue a prior session
+
+### Requirement 4: Gemini CLI Backend
+
+**User Story:** As an operator, I want to use Google Gemini CLI as a backend, so that I can leverage Gemini models through the gateway.
+
+#### Acceptance Criteria
+
+1. THE Gemini_Backend SHALL implement the Backend_Adapter interface
+2. THE Gemini_Backend SHALL spawn the Gemini CLI with the prompt as a positional argument for non-interactive one-shot mode
+3. THE Gemini_Backend SHALL pass `--output-format json` to receive structured JSON output
+4. THE Gemini_Backend SHALL pass `--approval-mode yolo` to auto-approve tool executions
+5. WHEN a Session_ID is provided, THE Gemini_Backend SHALL pass `--resume <Session_ID>` to continue a prior session
+6. THE Gemini_Backend SHALL parse the JSON output to extract the response text
+
+### Requirement 5: OpenCode CLI Backend
+
+**User Story:** As an operator, I want to use OpenCode CLI as a backend, so that I can leverage multiple model providers through OpenCode's provider system.
+
+#### Acceptance Criteria
+
+1. THE OpenCode_Backend SHALL implement the Backend_Adapter interface
+2. THE OpenCode_Backend SHALL spawn the OpenCode CLI using `opencode run` subcommand for non-interactive execution
+3. THE OpenCode_Backend SHALL pass `--format json` to receive JSON event output
+4. WHEN a Session_ID is provided, THE OpenCode_Backend SHALL pass `--session <Session_ID> --continue` to resume a prior session
+5. WHEN a model is configured, THE OpenCode_Backend SHALL pass `--model <provider/model>` to select the model
+6. THE OpenCode_Backend SHALL parse the JSON events to extract the final response text
+
+### Requirement 6: Backend Selection via Configuration
+
+**User Story:** As an operator, I want to select which CLI backend to use through environment variables, so that I can switch backends without code changes.
+
+#### Acceptance Criteria
+
+1. THE Gateway SHALL read an `AGENT_BACKEND` environment variable to determine which CLI_Backend to activate
+2. THE Gateway SHALL accept values `claude`, `codex`, `gemini`, and `opencode` for the `AGENT_BACKEND` variable
+3. WHEN `AGENT_BACKEND` is not set, THE Gateway SHALL default to `claude` for backward compatibility
+4. THE Gateway SHALL read a `BACKEND_CLI_PATH` environment variable to override the default binary path for the selected backend
+5. IF an unrecognized value is provided for `AGENT_BACKEND`, THEN THE Gateway SHALL fail at startup with a descriptive error message listing valid options
+
+### Requirement 7: Backend-Specific Configuration
+
+**User Story:** As an operator, I want to pass backend-specific settings through environment variables, so that I can tune each backend's behavior.
+
+#### Acceptance Criteria
+
+1. THE Gateway SHALL read `BACKEND_MODEL` environment variable to pass a model override to the active CLI_Backend
+2. THE Gateway SHALL read `BACKEND_MAX_TURNS` environment variable to limit the number of agentic turns, defaulting to 25
+3. WHEN the active backend does not support a configured option, THE Gateway SHALL log a warning and ignore the unsupported option
+4. THE Gateway SHALL pass the existing `ALLOWED_TOOLS` configuration to backends that support tool filtering
+
+### Requirement 8: Unified Output Parsing
+
+**User Story:** As a developer, I want each backend to normalize its output into a common format, so that downstream processing (Discord messaging, archiving) works identically regardless of backend.
+
+#### Acceptance Criteria
+
+1. THE Backend_Adapter SHALL return Event_Result with fields: `responseText` (string or undefined), `sessionId` (string or undefined), and `isError` (boolean)
+2. WHEN a CLI_Backend process exits with a non-zero exit code, THE Backend_Adapter SHALL set `isError` to true and include the stderr content in `responseText`
+3. WHEN a CLI_Backend process exceeds the configured query timeout, THE Backend_Adapter SHALL terminate the process and return an Event_Result with `isError` set to true and `responseText` set to "Query timed out"
+4. THE Backend_Adapter SHALL support an optional streaming callback that receives partial result text as the CLI process produces output
+
+### Requirement 9: Backend Validation at Startup
+
+**User Story:** As an operator, I want the gateway to verify the selected backend is available at startup, so that I get immediate feedback if the CLI tool is missing or misconfigured.
+
+#### Acceptance Criteria
+
+1. WHEN the Gateway starts, THE Backend_Registry SHALL invoke the active CLI_Backend's validation method
+2. IF the validation fails, THEN THE Gateway SHALL log an error with the backend name and configured path, and exit with a non-zero exit code
+3. THE validation method SHALL check that the configured CLI binary path is executable
+
+### Requirement 10: Agent Runtime Refactoring
+
+**User Story:** As a developer, I want the AgentRuntime to delegate CLI execution to the Backend_Adapter, so that the runtime is decoupled from any specific CLI tool.
+
+#### Acceptance Criteria
+
+1. THE Agent_Runtime SHALL accept a Backend_Adapter instance through its constructor instead of directly referencing Claude CLI configuration
+2. THE Agent_Runtime SHALL call the Backend_Adapter's execute method instead of spawning CLI processes directly
+3. THE Agent_Runtime SHALL map the Backend_Adapter's Event_Result to the existing EventResult interface used by the rest of the gateway
+4. WHEN the Backend_Adapter returns a Session_ID, THE Agent_Runtime SHALL store the Session_ID in the Session_Manager for the corresponding channel
--- a/.kiro/specs/multi-cli-backend/tasks.md
+++ b/.kiro/specs/multi-cli-backend/tasks.md
@@ -0,0 +1,77 @@
+# Tasks
+
+## Task 1: Create BackendAdapter interface and shared types
+- [x] 1.1 Create `src/backends/types.ts` with `BackendAdapter` interface, `BackendAdapterConfig`, `BackendEventResult`, `StreamCallback`, and `BackendName` type
+- [x] 1.2 Export all types from `src/backends/index.ts` barrel file
+
+## Task 2: Implement ClaudeCodeBackend
+- [x] 2.1 Create `src/backends/claude-backend.ts` implementing `BackendAdapter`
+- [x] 2.2 Extract CLI spawning logic from `AgentRuntime.runClaude()` into `execute()` method with arg building for `-p`, `--output-format json`, `--dangerously-skip-permissions`, `--append-system-prompt-file`, `--allowedTools`, `--max-turns`, and `--resume`
+- [x] 2.3 Implement `validate()` to check CLI binary is executable
+- [x] 2.4 Implement JSON array output parser extracting `session_id` from `system/init` and `result` from `result` objects
+- [x] 2.5 Write property test: Claude backend required flags (Property 1)
+  - [x] 🧪 PBT: *For any* prompt, system prompt, and tools list, generated args contain all required flags
+
+## Task 3: Implement CodexBackend
+- [x] 3.1 Create `src/backends/codex-backend.ts` implementing `BackendAdapter`
+- [x] 3.2 Implement `execute()` with `codex exec` subcommand, `--json`, `--dangerously-bypass-approvals-and-sandbox`, `--cd`, and `codex exec resume <id>` for sessions
+- [x] 3.3 Implement newline-delimited JSON parser extracting final assistant message
+- [x] 3.4 Write property test: Codex backend required flags (Property 2)
+  - [x] 🧪 PBT: *For any* prompt and working directory, generated args contain exec, --json, --dangerously-bypass-approvals-and-sandbox, and --cd
+
+## Task 4: Implement GeminiBackend
+- [x] 4.1 Create `src/backends/gemini-backend.ts` implementing `BackendAdapter`
+- [x] 4.2 Implement `execute()` with prompt as positional arg, `--output-format json`, `--approval-mode yolo`, and `--resume` for sessions
+- [x] 4.3 Implement JSON output parser extracting response text
+- [x] 4.4 Write property test: Gemini backend required flags (Property 3)
+  - [x] 🧪 PBT: *For any* prompt, generated args contain the prompt positionally, --output-format json, and --approval-mode yolo
+
+## Task 5: Implement OpenCodeBackend
+- [x] 5.1 Create `src/backends/opencode-backend.ts` implementing `BackendAdapter`
+- [x] 5.2 Implement `execute()` with `opencode run` subcommand, `--format json`, `--model`, and `--session <id> --continue` for sessions
+- [x] 5.3 Implement JSON event parser extracting final response text
+- [x] 5.4 Write property test: OpenCode backend required flags (Property 4)
+  - [x] 🧪 PBT: *For any* prompt and optional model, generated args contain run, --format json, and --model when configured
+
+## Task 6: Implement BackendRegistry
+- [x] 6.1 Create `src/backends/registry.ts` with `resolveBackendName()` and `createBackend()` functions
+- [x] 6.2 `resolveBackendName` accepts "claude", "codex", "gemini", "opencode", defaults to "claude" for undefined, throws for invalid values
+- [x] 6.3 `createBackend` instantiates the correct backend implementation from a `BackendName`
+- [x] 6.4 Write property test: Backend name resolution (Property 7)
+  - [x] 🧪 PBT: *For any* string, resolveBackendName returns correct BackendName for valid values, "claude" for undefined, and throws for invalid
+
+## Task 7: Cross-backend property tests
+- [x] 7.1 Write property test: Session resume args across backends (Property 5)
+  - [x] 🧪 PBT: *For any* backend and session ID, session flags appear when ID is provided and are absent when not
+- [x] 7.2 Write property test: Output parsing extracts correct fields (Property 6)
+  - [x] 🧪 PBT: *For any* valid backend-specific JSON output, parser produces BackendEventResult with correct responseText and sessionId
+- [x] 7.3 Write property test: Non-zero exit code produces error result (Property 8)
+  - [x] 🧪 PBT: *For any* backend, non-zero exit code, and stderr string, result has isError=true and responseText contains stderr
+
+## Task 8: Update GatewayConfig
+- [x] 8.1 Add `agentBackend`, `backendCliPath`, `backendModel`, `backendMaxTurns` fields to `GatewayConfig` interface in `src/config.ts`
+- [x] 8.2 Update `loadConfig()` to read `AGENT_BACKEND`, `BACKEND_CLI_PATH`, `BACKEND_MODEL`, `BACKEND_MAX_TURNS` env vars with defaults
+- [x] 8.3 Deprecate `claudeCliPath` field (keep for backward compat, map to `backendCliPath` when `AGENT_BACKEND=claude`)
+
+## Task 9: Refactor AgentRuntime
+- [x] 9.1 Add `BackendAdapter` parameter to `AgentRuntime` constructor
+- [x] 9.2 Replace `executeClaude()` and `runClaude()` with calls to `this.backend.execute()`
+- [x] 9.3 Implement `BackendEventResult` → gateway `EventResult` mapping in a helper method
+- [x] 9.4 Remove `ClaudeJsonResponse` interface and Claude-specific parsing from `AgentRuntime`
+- [x] 9.5 Write property test: EventResult mapping preserves semantics (Property 9)
+  - [x] 🧪 PBT: *For any* BackendEventResult and channel ID, mapping sets error or responseText correctly based on isError
+- [x] 9.6 Write property test: Session ID storage after backend execution (Property 10)
+  - [x] 🧪 PBT: *For any* channel ID and BackendEventResult with sessionId, SessionManager contains that sessionId after processing
+
+## Task 10: Startup validation and wiring
+- [x] 10.1 Update main entry point to call `resolveBackendName()` and `createBackend()` from config
+- [x] 10.2 Call `backend.validate()` at startup; log error with backend name and path, exit(1) on failure
+- [x] 10.3 Inject the `BackendAdapter` instance into `AgentRuntime` constructor
+- [x] 10.4 Write unit tests for startup validation flow (valid backend, invalid backend name, missing CLI binary)
+
+## Task 11: Unit tests for edge cases
+- [x] 11.1 Write unit tests for each backend's `validate()` method (binary exists vs missing)
+- [x] 11.2 Write unit tests for timeout behavior (process killed after queryTimeoutMs)
+- [x] 11.3 Write unit tests for session corruption detection and cleanup
+- [x] 11.4 Write unit tests for default config values when env vars are unset
+- [x] 11.5 Write unit tests for unsupported option warning (e.g., ALLOWED_TOOLS on backends without tool filtering)
				`@@ -0,0 +1 @@`
				`{"specId": "66d67457-3ea3-493c-9d0f-b868b51d309d", "workflowType": "requirements-first", "specType": "feature"}`