Files
aetheel-2/.kiro/specs/multi-cli-backend/design.md
tanmay11k 453389f55c feat: add pluggable multi-CLI backend system
Implement BackendAdapter interface with four CLI backends:
- ClaudeCodeBackend (extracted from AgentRuntime)
- CodexBackend (OpenAI Codex CLI)
- GeminiBackend (Google Gemini CLI)
- OpenCodeBackend (OpenCode CLI)

Add BackendRegistry for resolution/creation via AGENT_BACKEND env var.
Refactor AgentRuntime to delegate to BackendAdapter instead of
hardcoding Claude CLI. Update GatewayConfig with new env vars
(AGENT_BACKEND, BACKEND_CLI_PATH, BACKEND_MODEL, BACKEND_MAX_TURNS).

Includes 10 property-based test files and unit tests for edge cases.
2026-02-22 23:41:30 -05:00

371 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design Document: Multi-CLI Backend
## Overview
This design introduces a pluggable CLI backend system for the Aetheel gateway. The current architecture hardcodes Claude Code CLI invocation directly inside `AgentRuntime`. We will extract a `BackendAdapter` interface and provide four implementations (Claude, Codex, Gemini, OpenCode), each encapsulating CLI spawning, argument construction, output parsing, and session management. A `BackendRegistry` resolves the active backend from environment configuration at startup, validates it, and injects it into `AgentRuntime`.
The key design goals are:
- Zero behavioral change for existing Claude deployments (backward compatible defaults)
- Each backend is a self-contained module with no cross-dependencies
- The rest of the gateway (event processing, Discord integration, session management) remains untouched
- Output is normalized into a single `EventResult` shape regardless of backend
## Architecture
```mermaid
graph TD
A[Discord Bot] --> B[EventQueue]
B --> C[AgentRuntime]
C --> D[BackendAdapter Interface]
D --> E[ClaudeCodeBackend]
D --> F[CodexBackend]
D --> G[GeminiBackend]
D --> H[OpenCodeBackend]
I[BackendRegistry] -->|resolves active backend| D
J[GatewayConfig] -->|AGENT_BACKEND env| I
I -->|validates at startup| D
```
### Startup Flow
```mermaid
sequenceDiagram
participant Main
participant Config as GatewayConfig
participant Registry as BackendRegistry
participant Backend as BackendAdapter
participant Runtime as AgentRuntime
Main->>Config: loadConfig()
Config-->>Main: config (includes agentBackend, backendCliPath)
Main->>Registry: createBackend(config)
Registry-->>Main: BackendAdapter instance
Main->>Backend: validate()
alt validation fails
Main->>Main: log error, exit(1)
end
Main->>Runtime: new AgentRuntime(config, backend, ...)
```
### Execution Flow
```mermaid
sequenceDiagram
participant Runtime as AgentRuntime
participant Backend as BackendAdapter
participant CLI as CLI Process
Runtime->>Backend: execute(prompt, systemPrompt, sessionId?, onStream?)
Backend->>CLI: spawn with backend-specific args
CLI-->>Backend: stdout (JSON events)
Backend->>Backend: parse output into EventResult
Backend-->>Runtime: EventResult { responseText, sessionId, isError }
```
## Components and Interfaces
### BackendAdapter Interface
```typescript
export interface BackendAdapterConfig {
cliPath: string;
workingDir: string;
queryTimeoutMs: number;
allowedTools: string[];
maxTurns: number;
model?: string;
}
export interface EventResult {
responseText?: string;
sessionId?: string;
isError: boolean;
}
export type StreamCallback = (text: string) => Promise<void>;
export interface BackendAdapter {
/** Unique identifier for this backend (e.g., "claude", "codex") */
name(): string;
/** Execute a prompt and return normalized results */
execute(
prompt: string,
systemPrompt: string,
sessionId?: string,
onStream?: StreamCallback,
): Promise<EventResult>;
/** Validate that the CLI binary is reachable and executable */
validate(): Promise<boolean>;
}
```
### ClaudeCodeBackend
Preserves the existing behavior extracted from `AgentRuntime.runClaude()`.
- Writes system prompt to a temp file, passes via `--append-system-prompt-file`
- Spawns: `claude -p <prompt> --output-format json --dangerously-skip-permissions --append-system-prompt-file <file>`
- Session resume: `--resume <sessionId>`
- Tool filtering: `--allowedTools <tool>` for each tool
- Max turns: `--max-turns <n>`
- Parses JSON array output for `system/init` (session_id) and `result` objects
### CodexBackend
- Spawns: `codex exec <prompt> --json --dangerously-bypass-approvals-and-sandbox`
- Working directory: `--cd <path>`
- Session resume: `codex exec resume <sessionId>` with follow-up prompt
- Parses newline-delimited JSON events for the final assistant message
- System prompt: passed via `--config system_prompt=<text>` or prepended to prompt
### GeminiBackend
- Spawns: `gemini <prompt> --output-format json --approval-mode yolo`
- Session resume: `--resume <sessionId>`
- Parses JSON output for response text
- System prompt: prepended to prompt text (Gemini CLI has no system prompt file flag in non-interactive mode)
### OpenCodeBackend
- Spawns: `opencode run <prompt> --format json`
- Session resume: `--session <sessionId> --continue`
- Model selection: `--model <provider/model>`
- Parses JSON events for final response text
- System prompt: prepended to prompt text
### BackendRegistry
```typescript
export type BackendName = "claude" | "codex" | "gemini" | "opencode";
export function createBackend(
name: BackendName,
config: BackendAdapterConfig,
): BackendAdapter;
export function resolveBackendName(raw: string | undefined): BackendName;
```
- `resolveBackendName` maps the `AGENT_BACKEND` env var to a valid `BackendName`, defaulting to `"claude"`, or throws with a descriptive error listing valid options
- `createBackend` instantiates the correct implementation
### AgentRuntime Refactoring
The constructor changes from:
```typescript
constructor(config, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
```
to:
```typescript
constructor(config, backend, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
```
- `executeClaude()` and `runClaude()` are replaced by `this.backend.execute()`
- The `ClaudeJsonResponse` interface is removed from `AgentRuntime`
- `EventResult` mapping: the backend's `EventResult` maps directly to the gateway's existing `EventResult` interface (adding `targetChannelId` in the runtime layer)
### GatewayConfig Changes
```typescript
export interface GatewayConfig {
// ... existing fields ...
agentBackend: BackendName; // NEW: replaces implicit claude-only
backendCliPath: string; // NEW: replaces claudeCliPath
backendModel?: string; // NEW: optional model override
backendMaxTurns: number; // NEW: configurable max turns
// claudeCliPath removed
}
```
New environment variables:
- `AGENT_BACKEND``agentBackend` (default: `"claude"`)
- `BACKEND_CLI_PATH``backendCliPath` (default: backend-specific, e.g., `"claude"`, `"codex"`, `"gemini"`, `"opencode"`)
- `BACKEND_MODEL``backendModel`
- `BACKEND_MAX_TURNS``backendMaxTurns` (default: `25`)
## Data Models
### EventResult (Backend)
```typescript
export interface BackendEventResult {
responseText?: string;
sessionId?: string;
isError: boolean;
}
```
This is the normalized output from any backend. The `AgentRuntime` maps it to the gateway's `EventResult`:
```typescript
// Gateway EventResult (existing, unchanged)
export interface EventResult {
responseText?: string;
targetChannelId?: string;
sessionId?: string;
error?: string;
}
```
Mapping logic:
```typescript
if (backendResult.isError) {
return { error: backendResult.responseText, targetChannelId };
} else {
return { responseText: backendResult.responseText, targetChannelId, sessionId: backendResult.sessionId };
}
```
### BackendAdapterConfig
```typescript
export interface BackendAdapterConfig {
cliPath: string; // Path to CLI binary
workingDir: string; // Working directory for CLI process
queryTimeoutMs: number; // Timeout before killing the process
allowedTools: string[]; // Tools to whitelist (backend-specific support)
maxTurns: number; // Max agentic turns
model?: string; // Optional model override
}
```
### CLI Output Formats
| Backend | Output Format | Session ID Source | Result Source |
|-----------|------------------------------|--------------------------------------|-----------------------------------|
| Claude | JSON array | `system/init` object `.session_id` | `result` object `.result` |
| Codex | Newline-delimited JSON | Session ID from exec metadata | Final assistant message content |
| Gemini | JSON object | Session metadata in output | Response text field |
| OpenCode | JSON events | Session field in response | Final response text |
## Correctness Properties
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
### Property 1: Claude backend required flags
*For any* prompt string, system prompt string, and allowed tools list, the Claude backend's generated argument list SHALL always contain `-p`, `--output-format json`, `--dangerously-skip-permissions`, `--append-system-prompt-file`, `--max-turns`, and one `--allowedTools` entry per configured tool.
**Validates: Requirements 2.2, 2.5, 2.6**
### Property 2: Codex backend required flags
*For any* prompt string and working directory, the Codex backend's generated argument list SHALL always contain the `exec` subcommand, `--json`, `--dangerously-bypass-approvals-and-sandbox`, and `--cd <workingDir>`.
**Validates: Requirements 3.2, 3.3, 3.4, 3.5**
### Property 3: Gemini backend required flags
*For any* prompt string, the Gemini backend's generated argument list SHALL always contain the prompt as a positional argument, `--output-format json`, and `--approval-mode yolo`.
**Validates: Requirements 4.2, 4.3, 4.4**
### Property 4: OpenCode backend required flags
*For any* prompt string and optional model string, the OpenCode backend's generated argument list SHALL always contain the `run` subcommand, `--format json`, and when a model is configured, `--model <model>`.
**Validates: Requirements 5.2, 5.3, 5.5**
### Property 5: Session resume args across backends
*For any* backend and any non-empty session ID string, the generated argument list SHALL include the backend-specific session resume flags: `--resume <id>` for Claude, `resume <id>` subcommand for Codex, `--resume <id>` for Gemini, and `--session <id> --continue` for OpenCode. When no session ID is provided, no session-related flags SHALL appear.
**Validates: Requirements 2.3, 3.7, 4.5, 5.4**
### Property 6: Output parsing extracts correct fields
*For any* valid backend-specific JSON output containing a response text and session ID, the backend's parser SHALL produce a `BackendEventResult` where `responseText` matches the expected response content and `sessionId` matches the expected session identifier.
**Validates: Requirements 2.4, 3.6, 4.6, 5.6, 8.1**
### Property 7: Backend name resolution
*For any* string, `resolveBackendName` SHALL return the corresponding `BackendName` if the string is one of `"claude"`, `"codex"`, `"gemini"`, or `"opencode"`, SHALL return `"claude"` when the input is `undefined`, and SHALL throw a descriptive error for any other string value.
**Validates: Requirements 6.1, 6.2, 6.3, 6.5**
### Property 8: Non-zero exit code produces error result
*For any* backend, any non-zero exit code, and any stderr string, the backend SHALL return a `BackendEventResult` with `isError` set to `true` and `responseText` containing the stderr content.
**Validates: Requirements 8.2**
### Property 9: EventResult mapping preserves semantics
*For any* `BackendEventResult` and target channel ID, the mapping to the gateway's `EventResult` SHALL set `error` to `responseText` when `isError` is true (with no `responseText` on the gateway result), and SHALL set `responseText` and `sessionId` when `isError` is false (with no `error` on the gateway result). `targetChannelId` SHALL always be set.
**Validates: Requirements 10.3**
### Property 10: Session ID storage after backend execution
*For any* channel ID and any `BackendEventResult` containing a non-undefined `sessionId`, after the `AgentRuntime` processes the result, the `SessionManager` SHALL contain that session ID for that channel. When `sessionId` is undefined, the session manager SHALL not be updated for that channel.
**Validates: Requirements 10.4**
## Error Handling
### CLI Process Errors
| Error Condition | Handling |
|---|---|
| CLI binary not found | `validate()` returns false at startup → gateway logs error with backend name and path, exits with code 1 |
| Non-zero exit code | Backend sets `isError: true`, includes stderr (truncated to 500 chars) in `responseText` |
| Query timeout | Backend kills process with SIGTERM after `queryTimeoutMs`, returns `{ isError: true, responseText: "Query timed out" }` |
| Invalid JSON output | Backend returns `{ isError: true, responseText: "Failed to parse CLI output" }` |
| Session corruption | `AgentRuntime` detects session-related error messages, removes session from `SessionManager`, allows retry without session |
### Configuration Errors
| Error Condition | Handling |
|---|---|
| Invalid `AGENT_BACKEND` value | `resolveBackendName` throws with message listing valid options; gateway fails at startup |
| Invalid `BACKEND_MAX_TURNS` | Falls back to default (25), logs warning |
| Unsupported option for backend | Logs warning, ignores the option (e.g., `ALLOWED_TOOLS` for backends that don't support tool filtering) |
### Retry Strategy
The existing `withRetry` mechanism in `AgentRuntime` continues to wrap backend execution calls:
- Max 3 retries with exponential backoff (5s base)
- Transient errors (timeout, spawn failure, crash) trigger retry
- Session corruption errors are non-retryable; session is cleared and the next attempt starts fresh
## Testing Strategy
### Property-Based Testing
Library: [fast-check](https://github.com/dubzzz/fast-check) for TypeScript property-based testing.
Each property test runs a minimum of 100 iterations. Each test is tagged with a comment referencing the design property:
```typescript
// Feature: multi-cli-backend, Property 1: Claude backend required flags
```
Properties to implement:
- **Property 14**: Generate random prompt strings, system prompts, tool lists, and config values. Call each backend's arg-building function and assert required flags are present.
- **Property 5**: Generate random session ID strings (including empty/undefined). For each backend, verify session flags appear only when a session ID is provided.
- **Property 6**: Generate random valid JSON output structures per backend format. Parse and verify extracted fields match.
- **Property 7**: Generate random strings. Verify resolution behavior (valid → correct BackendName, undefined → "claude", invalid → throws).
- **Property 8**: Generate random exit codes (non-zero) and stderr strings. Verify error result shape.
- **Property 9**: Generate random `BackendEventResult` objects. Verify mapping to gateway `EventResult`.
- **Property 10**: Generate random channel IDs and `BackendEventResult` objects with/without session IDs. Verify session manager state.
### Unit Testing
Unit tests complement property tests for specific examples and edge cases:
- Each backend's `validate()` method with mocked filesystem
- Timeout behavior with a mock slow process
- Startup flow: valid config → backend created → validated → injected into runtime
- Startup flow: invalid backend name → descriptive error
- Default config values when env vars are unset
- Streaming callback invocation during output parsing
- Session corruption detection and cleanup
### Integration Testing
- End-to-end test with a mock CLI script that echoes JSON in each backend's format
- Verify the full flow: config → registry → backend → execute → parse → EventResult