Implement BackendAdapter interface with four CLI backends: - ClaudeCodeBackend (extracted from AgentRuntime) - CodexBackend (OpenAI Codex CLI) - GeminiBackend (Google Gemini CLI) - OpenCodeBackend (OpenCode CLI) Add BackendRegistry for resolution/creation via AGENT_BACKEND env var. Refactor AgentRuntime to delegate to BackendAdapter instead of hardcoding Claude CLI. Update GatewayConfig with new env vars (AGENT_BACKEND, BACKEND_CLI_PATH, BACKEND_MODEL, BACKEND_MAX_TURNS). Includes 10 property-based test files and unit tests for edge cases.
371 lines
15 KiB
Markdown
371 lines
15 KiB
Markdown
# Design Document: Multi-CLI Backend
|
||
|
||
## Overview
|
||
|
||
This design introduces a pluggable CLI backend system for the Aetheel gateway. The current architecture hardcodes Claude Code CLI invocation directly inside `AgentRuntime`. We will extract a `BackendAdapter` interface and provide four implementations (Claude, Codex, Gemini, OpenCode), each encapsulating CLI spawning, argument construction, output parsing, and session management. A `BackendRegistry` resolves the active backend from environment configuration at startup, validates it, and injects it into `AgentRuntime`.
|
||
|
||
The key design goals are:
|
||
- Zero behavioral change for existing Claude deployments (backward compatible defaults)
|
||
- Each backend is a self-contained module with no cross-dependencies
|
||
- The rest of the gateway (event processing, Discord integration, session management) remains untouched
|
||
- Output is normalized into a single `EventResult` shape regardless of backend
|
||
|
||
## Architecture
|
||
|
||
```mermaid
|
||
graph TD
|
||
A[Discord Bot] --> B[EventQueue]
|
||
B --> C[AgentRuntime]
|
||
C --> D[BackendAdapter Interface]
|
||
D --> E[ClaudeCodeBackend]
|
||
D --> F[CodexBackend]
|
||
D --> G[GeminiBackend]
|
||
D --> H[OpenCodeBackend]
|
||
I[BackendRegistry] -->|resolves active backend| D
|
||
J[GatewayConfig] -->|AGENT_BACKEND env| I
|
||
I -->|validates at startup| D
|
||
```
|
||
|
||
### Startup Flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Main
|
||
participant Config as GatewayConfig
|
||
participant Registry as BackendRegistry
|
||
participant Backend as BackendAdapter
|
||
participant Runtime as AgentRuntime
|
||
|
||
Main->>Config: loadConfig()
|
||
Config-->>Main: config (includes agentBackend, backendCliPath)
|
||
Main->>Registry: createBackend(config)
|
||
Registry-->>Main: BackendAdapter instance
|
||
Main->>Backend: validate()
|
||
alt validation fails
|
||
Main->>Main: log error, exit(1)
|
||
end
|
||
Main->>Runtime: new AgentRuntime(config, backend, ...)
|
||
```
|
||
|
||
### Execution Flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Runtime as AgentRuntime
|
||
participant Backend as BackendAdapter
|
||
participant CLI as CLI Process
|
||
|
||
Runtime->>Backend: execute(prompt, systemPrompt, sessionId?, onStream?)
|
||
Backend->>CLI: spawn with backend-specific args
|
||
CLI-->>Backend: stdout (JSON events)
|
||
Backend->>Backend: parse output into EventResult
|
||
Backend-->>Runtime: EventResult { responseText, sessionId, isError }
|
||
```
|
||
|
||
## Components and Interfaces
|
||
|
||
### BackendAdapter Interface
|
||
|
||
```typescript
|
||
export interface BackendAdapterConfig {
|
||
cliPath: string;
|
||
workingDir: string;
|
||
queryTimeoutMs: number;
|
||
allowedTools: string[];
|
||
maxTurns: number;
|
||
model?: string;
|
||
}
|
||
|
||
export interface EventResult {
|
||
responseText?: string;
|
||
sessionId?: string;
|
||
isError: boolean;
|
||
}
|
||
|
||
export type StreamCallback = (text: string) => Promise<void>;
|
||
|
||
export interface BackendAdapter {
|
||
/** Unique identifier for this backend (e.g., "claude", "codex") */
|
||
name(): string;
|
||
|
||
/** Execute a prompt and return normalized results */
|
||
execute(
|
||
prompt: string,
|
||
systemPrompt: string,
|
||
sessionId?: string,
|
||
onStream?: StreamCallback,
|
||
): Promise<EventResult>;
|
||
|
||
/** Validate that the CLI binary is reachable and executable */
|
||
validate(): Promise<boolean>;
|
||
}
|
||
```
|
||
|
||
### ClaudeCodeBackend
|
||
|
||
Preserves the existing behavior extracted from `AgentRuntime.runClaude()`.
|
||
|
||
- Writes system prompt to a temp file, passes via `--append-system-prompt-file`
|
||
- Spawns: `claude -p <prompt> --output-format json --dangerously-skip-permissions --append-system-prompt-file <file>`
|
||
- Session resume: `--resume <sessionId>`
|
||
- Tool filtering: `--allowedTools <tool>` for each tool
|
||
- Max turns: `--max-turns <n>`
|
||
- Parses JSON array output for `system/init` (session_id) and `result` objects
|
||
|
||
### CodexBackend
|
||
|
||
- Spawns: `codex exec <prompt> --json --dangerously-bypass-approvals-and-sandbox`
|
||
- Working directory: `--cd <path>`
|
||
- Session resume: `codex exec resume <sessionId>` with follow-up prompt
|
||
- Parses newline-delimited JSON events for the final assistant message
|
||
- System prompt: passed via `--config system_prompt=<text>` or prepended to prompt
|
||
|
||
### GeminiBackend
|
||
|
||
- Spawns: `gemini <prompt> --output-format json --approval-mode yolo`
|
||
- Session resume: `--resume <sessionId>`
|
||
- Parses JSON output for response text
|
||
- System prompt: prepended to prompt text (Gemini CLI has no system prompt file flag in non-interactive mode)
|
||
|
||
### OpenCodeBackend
|
||
|
||
- Spawns: `opencode run <prompt> --format json`
|
||
- Session resume: `--session <sessionId> --continue`
|
||
- Model selection: `--model <provider/model>`
|
||
- Parses JSON events for final response text
|
||
- System prompt: prepended to prompt text
|
||
|
||
### BackendRegistry
|
||
|
||
```typescript
|
||
export type BackendName = "claude" | "codex" | "gemini" | "opencode";
|
||
|
||
export function createBackend(
|
||
name: BackendName,
|
||
config: BackendAdapterConfig,
|
||
): BackendAdapter;
|
||
|
||
export function resolveBackendName(raw: string | undefined): BackendName;
|
||
```
|
||
|
||
- `resolveBackendName` maps the `AGENT_BACKEND` env var to a valid `BackendName`, defaulting to `"claude"`, or throws with a descriptive error listing valid options
|
||
- `createBackend` instantiates the correct implementation
|
||
|
||
### AgentRuntime Refactoring
|
||
|
||
The constructor changes from:
|
||
```typescript
|
||
constructor(config, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
|
||
```
|
||
to:
|
||
```typescript
|
||
constructor(config, backend, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
|
||
```
|
||
|
||
- `executeClaude()` and `runClaude()` are replaced by `this.backend.execute()`
|
||
- The `ClaudeJsonResponse` interface is removed from `AgentRuntime`
|
||
- `EventResult` mapping: the backend's `EventResult` maps directly to the gateway's existing `EventResult` interface (adding `targetChannelId` in the runtime layer)
|
||
|
||
### GatewayConfig Changes
|
||
|
||
```typescript
|
||
export interface GatewayConfig {
|
||
// ... existing fields ...
|
||
agentBackend: BackendName; // NEW: replaces implicit claude-only
|
||
backendCliPath: string; // NEW: replaces claudeCliPath
|
||
backendModel?: string; // NEW: optional model override
|
||
backendMaxTurns: number; // NEW: configurable max turns
|
||
// claudeCliPath removed
|
||
}
|
||
```
|
||
|
||
New environment variables:
|
||
- `AGENT_BACKEND` → `agentBackend` (default: `"claude"`)
|
||
- `BACKEND_CLI_PATH` → `backendCliPath` (default: backend-specific, e.g., `"claude"`, `"codex"`, `"gemini"`, `"opencode"`)
|
||
- `BACKEND_MODEL` → `backendModel`
|
||
- `BACKEND_MAX_TURNS` → `backendMaxTurns` (default: `25`)
|
||
|
||
## Data Models
|
||
|
||
### EventResult (Backend)
|
||
|
||
```typescript
|
||
export interface BackendEventResult {
|
||
responseText?: string;
|
||
sessionId?: string;
|
||
isError: boolean;
|
||
}
|
||
```
|
||
|
||
This is the normalized output from any backend. The `AgentRuntime` maps it to the gateway's `EventResult`:
|
||
|
||
```typescript
|
||
// Gateway EventResult (existing, unchanged)
|
||
export interface EventResult {
|
||
responseText?: string;
|
||
targetChannelId?: string;
|
||
sessionId?: string;
|
||
error?: string;
|
||
}
|
||
```
|
||
|
||
Mapping logic:
|
||
```typescript
|
||
if (backendResult.isError) {
|
||
return { error: backendResult.responseText, targetChannelId };
|
||
} else {
|
||
return { responseText: backendResult.responseText, targetChannelId, sessionId: backendResult.sessionId };
|
||
}
|
||
```
|
||
|
||
### BackendAdapterConfig
|
||
|
||
```typescript
|
||
export interface BackendAdapterConfig {
|
||
cliPath: string; // Path to CLI binary
|
||
workingDir: string; // Working directory for CLI process
|
||
queryTimeoutMs: number; // Timeout before killing the process
|
||
allowedTools: string[]; // Tools to whitelist (backend-specific support)
|
||
maxTurns: number; // Max agentic turns
|
||
model?: string; // Optional model override
|
||
}
|
||
```
|
||
|
||
### CLI Output Formats
|
||
|
||
| Backend | Output Format | Session ID Source | Result Source |
|
||
|-----------|------------------------------|--------------------------------------|-----------------------------------|
|
||
| Claude | JSON array | `system/init` object `.session_id` | `result` object `.result` |
|
||
| Codex | Newline-delimited JSON | Session ID from exec metadata | Final assistant message content |
|
||
| Gemini | JSON object | Session metadata in output | Response text field |
|
||
| OpenCode | JSON events | Session field in response | Final response text |
|
||
|
||
|
||
## Correctness Properties
|
||
|
||
*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*
|
||
|
||
### Property 1: Claude backend required flags
|
||
|
||
*For any* prompt string, system prompt string, and allowed tools list, the Claude backend's generated argument list SHALL always contain `-p`, `--output-format json`, `--dangerously-skip-permissions`, `--append-system-prompt-file`, `--max-turns`, and one `--allowedTools` entry per configured tool.
|
||
|
||
**Validates: Requirements 2.2, 2.5, 2.6**
|
||
|
||
### Property 2: Codex backend required flags
|
||
|
||
*For any* prompt string and working directory, the Codex backend's generated argument list SHALL always contain the `exec` subcommand, `--json`, `--dangerously-bypass-approvals-and-sandbox`, and `--cd <workingDir>`.
|
||
|
||
**Validates: Requirements 3.2, 3.3, 3.4, 3.5**
|
||
|
||
### Property 3: Gemini backend required flags
|
||
|
||
*For any* prompt string, the Gemini backend's generated argument list SHALL always contain the prompt as a positional argument, `--output-format json`, and `--approval-mode yolo`.
|
||
|
||
**Validates: Requirements 4.2, 4.3, 4.4**
|
||
|
||
### Property 4: OpenCode backend required flags
|
||
|
||
*For any* prompt string and optional model string, the OpenCode backend's generated argument list SHALL always contain the `run` subcommand, `--format json`, and when a model is configured, `--model <model>`.
|
||
|
||
**Validates: Requirements 5.2, 5.3, 5.5**
|
||
|
||
### Property 5: Session resume args across backends
|
||
|
||
*For any* backend and any non-empty session ID string, the generated argument list SHALL include the backend-specific session resume flags: `--resume <id>` for Claude, `resume <id>` subcommand for Codex, `--resume <id>` for Gemini, and `--session <id> --continue` for OpenCode. When no session ID is provided, no session-related flags SHALL appear.
|
||
|
||
**Validates: Requirements 2.3, 3.7, 4.5, 5.4**
|
||
|
||
### Property 6: Output parsing extracts correct fields
|
||
|
||
*For any* valid backend-specific JSON output containing a response text and session ID, the backend's parser SHALL produce a `BackendEventResult` where `responseText` matches the expected response content and `sessionId` matches the expected session identifier.
|
||
|
||
**Validates: Requirements 2.4, 3.6, 4.6, 5.6, 8.1**
|
||
|
||
### Property 7: Backend name resolution
|
||
|
||
*For any* string, `resolveBackendName` SHALL return the corresponding `BackendName` if the string is one of `"claude"`, `"codex"`, `"gemini"`, or `"opencode"`, SHALL return `"claude"` when the input is `undefined`, and SHALL throw a descriptive error for any other string value.
|
||
|
||
**Validates: Requirements 6.1, 6.2, 6.3, 6.5**
|
||
|
||
### Property 8: Non-zero exit code produces error result
|
||
|
||
*For any* backend, any non-zero exit code, and any stderr string, the backend SHALL return a `BackendEventResult` with `isError` set to `true` and `responseText` containing the stderr content.
|
||
|
||
**Validates: Requirements 8.2**
|
||
|
||
### Property 9: EventResult mapping preserves semantics
|
||
|
||
*For any* `BackendEventResult` and target channel ID, the mapping to the gateway's `EventResult` SHALL set `error` to `responseText` when `isError` is true (with no `responseText` on the gateway result), and SHALL set `responseText` and `sessionId` when `isError` is false (with no `error` on the gateway result). `targetChannelId` SHALL always be set.
|
||
|
||
**Validates: Requirements 10.3**
|
||
|
||
### Property 10: Session ID storage after backend execution
|
||
|
||
*For any* channel ID and any `BackendEventResult` containing a non-undefined `sessionId`, after the `AgentRuntime` processes the result, the `SessionManager` SHALL contain that session ID for that channel. When `sessionId` is undefined, the session manager SHALL not be updated for that channel.
|
||
|
||
**Validates: Requirements 10.4**
|
||
|
||
## Error Handling
|
||
|
||
### CLI Process Errors
|
||
|
||
| Error Condition | Handling |
|
||
|---|---|
|
||
| CLI binary not found | `validate()` returns false at startup → gateway logs error with backend name and path, exits with code 1 |
|
||
| Non-zero exit code | Backend sets `isError: true`, includes stderr (truncated to 500 chars) in `responseText` |
|
||
| Query timeout | Backend kills process with SIGTERM after `queryTimeoutMs`, returns `{ isError: true, responseText: "Query timed out" }` |
|
||
| Invalid JSON output | Backend returns `{ isError: true, responseText: "Failed to parse CLI output" }` |
|
||
| Session corruption | `AgentRuntime` detects session-related error messages, removes session from `SessionManager`, allows retry without session |
|
||
|
||
### Configuration Errors
|
||
|
||
| Error Condition | Handling |
|
||
|---|---|
|
||
| Invalid `AGENT_BACKEND` value | `resolveBackendName` throws with message listing valid options; gateway fails at startup |
|
||
| Invalid `BACKEND_MAX_TURNS` | Falls back to default (25), logs warning |
|
||
| Unsupported option for backend | Logs warning, ignores the option (e.g., `ALLOWED_TOOLS` for backends that don't support tool filtering) |
|
||
|
||
### Retry Strategy
|
||
|
||
The existing `withRetry` mechanism in `AgentRuntime` continues to wrap backend execution calls:
|
||
- Max 3 retries with exponential backoff (5s base)
|
||
- Transient errors (timeout, spawn failure, crash) trigger retry
|
||
- Session corruption errors are non-retryable; session is cleared and the next attempt starts fresh
|
||
|
||
## Testing Strategy
|
||
|
||
### Property-Based Testing
|
||
|
||
Library: [fast-check](https://github.com/dubzzz/fast-check) for TypeScript property-based testing.
|
||
|
||
Each property test runs a minimum of 100 iterations. Each test is tagged with a comment referencing the design property:
|
||
|
||
```typescript
|
||
// Feature: multi-cli-backend, Property 1: Claude backend required flags
|
||
```
|
||
|
||
Properties to implement:
|
||
- **Property 1–4**: Generate random prompt strings, system prompts, tool lists, and config values. Call each backend's arg-building function and assert required flags are present.
|
||
- **Property 5**: Generate random session ID strings (including empty/undefined). For each backend, verify session flags appear only when a session ID is provided.
|
||
- **Property 6**: Generate random valid JSON output structures per backend format. Parse and verify extracted fields match.
|
||
- **Property 7**: Generate random strings. Verify resolution behavior (valid → correct BackendName, undefined → "claude", invalid → throws).
|
||
- **Property 8**: Generate random exit codes (non-zero) and stderr strings. Verify error result shape.
|
||
- **Property 9**: Generate random `BackendEventResult` objects. Verify mapping to gateway `EventResult`.
|
||
- **Property 10**: Generate random channel IDs and `BackendEventResult` objects with/without session IDs. Verify session manager state.
|
||
|
||
### Unit Testing
|
||
|
||
Unit tests complement property tests for specific examples and edge cases:
|
||
- Each backend's `validate()` method with mocked filesystem
|
||
- Timeout behavior with a mock slow process
|
||
- Startup flow: valid config → backend created → validated → injected into runtime
|
||
- Startup flow: invalid backend name → descriptive error
|
||
- Default config values when env vars are unset
|
||
- Streaming callback invocation during output parsing
|
||
- Session corruption detection and cleanup
|
||
|
||
### Integration Testing
|
||
|
||
- End-to-end test with a mock CLI script that echoes JSON in each backend's format
|
||
- Verify the full flow: config → registry → backend → execute → parse → EventResult
|