Implement BackendAdapter interface with four CLI backends: - ClaudeCodeBackend (extracted from AgentRuntime) - CodexBackend (OpenAI Codex CLI) - GeminiBackend (Google Gemini CLI) - OpenCodeBackend (OpenCode CLI) Add BackendRegistry for resolution/creation via AGENT_BACKEND env var. Refactor AgentRuntime to delegate to BackendAdapter instead of hardcoding Claude CLI. Update GatewayConfig with new env vars (AGENT_BACKEND, BACKEND_CLI_PATH, BACKEND_MODEL, BACKEND_MAX_TURNS). Includes 10 property-based test files and unit tests for edge cases.
15 KiB
Design Document: Multi-CLI Backend
Overview
This design introduces a pluggable CLI backend system for the Aetheel gateway. The current architecture hardcodes Claude Code CLI invocation directly inside AgentRuntime. We will extract a BackendAdapter interface and provide four implementations (Claude, Codex, Gemini, OpenCode), each encapsulating CLI spawning, argument construction, output parsing, and session management. A BackendRegistry resolves the active backend from environment configuration at startup, validates it, and injects it into AgentRuntime.
The key design goals are:
- Zero behavioral change for existing Claude deployments (backward compatible defaults)
- Each backend is a self-contained module with no cross-dependencies
- The rest of the gateway (event processing, Discord integration, session management) remains untouched
- Output is normalized into a single
EventResultshape regardless of backend
Architecture
graph TD
A[Discord Bot] --> B[EventQueue]
B --> C[AgentRuntime]
C --> D[BackendAdapter Interface]
D --> E[ClaudeCodeBackend]
D --> F[CodexBackend]
D --> G[GeminiBackend]
D --> H[OpenCodeBackend]
I[BackendRegistry] -->|resolves active backend| D
J[GatewayConfig] -->|AGENT_BACKEND env| I
I -->|validates at startup| D
Startup Flow
sequenceDiagram
participant Main
participant Config as GatewayConfig
participant Registry as BackendRegistry
participant Backend as BackendAdapter
participant Runtime as AgentRuntime
Main->>Config: loadConfig()
Config-->>Main: config (includes agentBackend, backendCliPath)
Main->>Registry: createBackend(config)
Registry-->>Main: BackendAdapter instance
Main->>Backend: validate()
alt validation fails
Main->>Main: log error, exit(1)
end
Main->>Runtime: new AgentRuntime(config, backend, ...)
Execution Flow
sequenceDiagram
participant Runtime as AgentRuntime
participant Backend as BackendAdapter
participant CLI as CLI Process
Runtime->>Backend: execute(prompt, systemPrompt, sessionId?, onStream?)
Backend->>CLI: spawn with backend-specific args
CLI-->>Backend: stdout (JSON events)
Backend->>Backend: parse output into EventResult
Backend-->>Runtime: EventResult { responseText, sessionId, isError }
Components and Interfaces
BackendAdapter Interface
export interface BackendAdapterConfig {
cliPath: string;
workingDir: string;
queryTimeoutMs: number;
allowedTools: string[];
maxTurns: number;
model?: string;
}
export interface EventResult {
responseText?: string;
sessionId?: string;
isError: boolean;
}
export type StreamCallback = (text: string) => Promise<void>;
export interface BackendAdapter {
/** Unique identifier for this backend (e.g., "claude", "codex") */
name(): string;
/** Execute a prompt and return normalized results */
execute(
prompt: string,
systemPrompt: string,
sessionId?: string,
onStream?: StreamCallback,
): Promise<EventResult>;
/** Validate that the CLI binary is reachable and executable */
validate(): Promise<boolean>;
}
ClaudeCodeBackend
Preserves the existing behavior extracted from AgentRuntime.runClaude().
- Writes system prompt to a temp file, passes via
--append-system-prompt-file - Spawns:
claude -p <prompt> --output-format json --dangerously-skip-permissions --append-system-prompt-file <file> - Session resume:
--resume <sessionId> - Tool filtering:
--allowedTools <tool>for each tool - Max turns:
--max-turns <n> - Parses JSON array output for
system/init(session_id) andresultobjects
CodexBackend
- Spawns:
codex exec <prompt> --json --dangerously-bypass-approvals-and-sandbox - Working directory:
--cd <path> - Session resume:
codex exec resume <sessionId>with follow-up prompt - Parses newline-delimited JSON events for the final assistant message
- System prompt: passed via
--config system_prompt=<text>or prepended to prompt
GeminiBackend
- Spawns:
gemini <prompt> --output-format json --approval-mode yolo - Session resume:
--resume <sessionId> - Parses JSON output for response text
- System prompt: prepended to prompt text (Gemini CLI has no system prompt file flag in non-interactive mode)
OpenCodeBackend
- Spawns:
opencode run <prompt> --format json - Session resume:
--session <sessionId> --continue - Model selection:
--model <provider/model> - Parses JSON events for final response text
- System prompt: prepended to prompt text
BackendRegistry
export type BackendName = "claude" | "codex" | "gemini" | "opencode";
export function createBackend(
name: BackendName,
config: BackendAdapterConfig,
): BackendAdapter;
export function resolveBackendName(raw: string | undefined): BackendName;
resolveBackendNamemaps theAGENT_BACKENDenv var to a validBackendName, defaulting to"claude", or throws with a descriptive error listing valid optionscreateBackendinstantiates the correct implementation
AgentRuntime Refactoring
The constructor changes from:
constructor(config, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
to:
constructor(config, backend, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)
executeClaude()andrunClaude()are replaced bythis.backend.execute()- The
ClaudeJsonResponseinterface is removed fromAgentRuntime EventResultmapping: the backend'sEventResultmaps directly to the gateway's existingEventResultinterface (addingtargetChannelIdin the runtime layer)
GatewayConfig Changes
export interface GatewayConfig {
// ... existing fields ...
agentBackend: BackendName; // NEW: replaces implicit claude-only
backendCliPath: string; // NEW: replaces claudeCliPath
backendModel?: string; // NEW: optional model override
backendMaxTurns: number; // NEW: configurable max turns
// claudeCliPath removed
}
New environment variables:
AGENT_BACKEND→agentBackend(default:"claude")BACKEND_CLI_PATH→backendCliPath(default: backend-specific, e.g.,"claude","codex","gemini","opencode")BACKEND_MODEL→backendModelBACKEND_MAX_TURNS→backendMaxTurns(default:25)
Data Models
EventResult (Backend)
export interface BackendEventResult {
responseText?: string;
sessionId?: string;
isError: boolean;
}
This is the normalized output from any backend. The AgentRuntime maps it to the gateway's EventResult:
// Gateway EventResult (existing, unchanged)
export interface EventResult {
responseText?: string;
targetChannelId?: string;
sessionId?: string;
error?: string;
}
Mapping logic:
if (backendResult.isError) {
return { error: backendResult.responseText, targetChannelId };
} else {
return { responseText: backendResult.responseText, targetChannelId, sessionId: backendResult.sessionId };
}
BackendAdapterConfig
export interface BackendAdapterConfig {
cliPath: string; // Path to CLI binary
workingDir: string; // Working directory for CLI process
queryTimeoutMs: number; // Timeout before killing the process
allowedTools: string[]; // Tools to whitelist (backend-specific support)
maxTurns: number; // Max agentic turns
model?: string; // Optional model override
}
CLI Output Formats
| Backend | Output Format | Session ID Source | Result Source |
|---|---|---|---|
| Claude | JSON array | system/init object .session_id |
result object .result |
| Codex | Newline-delimited JSON | Session ID from exec metadata | Final assistant message content |
| Gemini | JSON object | Session metadata in output | Response text field |
| OpenCode | JSON events | Session field in response | Final response text |
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Claude backend required flags
For any prompt string, system prompt string, and allowed tools list, the Claude backend's generated argument list SHALL always contain -p, --output-format json, --dangerously-skip-permissions, --append-system-prompt-file, --max-turns, and one --allowedTools entry per configured tool.
Validates: Requirements 2.2, 2.5, 2.6
Property 2: Codex backend required flags
For any prompt string and working directory, the Codex backend's generated argument list SHALL always contain the exec subcommand, --json, --dangerously-bypass-approvals-and-sandbox, and --cd <workingDir>.
Validates: Requirements 3.2, 3.3, 3.4, 3.5
Property 3: Gemini backend required flags
For any prompt string, the Gemini backend's generated argument list SHALL always contain the prompt as a positional argument, --output-format json, and --approval-mode yolo.
Validates: Requirements 4.2, 4.3, 4.4
Property 4: OpenCode backend required flags
For any prompt string and optional model string, the OpenCode backend's generated argument list SHALL always contain the run subcommand, --format json, and when a model is configured, --model <model>.
Validates: Requirements 5.2, 5.3, 5.5
Property 5: Session resume args across backends
For any backend and any non-empty session ID string, the generated argument list SHALL include the backend-specific session resume flags: --resume <id> for Claude, resume <id> subcommand for Codex, --resume <id> for Gemini, and --session <id> --continue for OpenCode. When no session ID is provided, no session-related flags SHALL appear.
Validates: Requirements 2.3, 3.7, 4.5, 5.4
Property 6: Output parsing extracts correct fields
For any valid backend-specific JSON output containing a response text and session ID, the backend's parser SHALL produce a BackendEventResult where responseText matches the expected response content and sessionId matches the expected session identifier.
Validates: Requirements 2.4, 3.6, 4.6, 5.6, 8.1
Property 7: Backend name resolution
For any string, resolveBackendName SHALL return the corresponding BackendName if the string is one of "claude", "codex", "gemini", or "opencode", SHALL return "claude" when the input is undefined, and SHALL throw a descriptive error for any other string value.
Validates: Requirements 6.1, 6.2, 6.3, 6.5
Property 8: Non-zero exit code produces error result
For any backend, any non-zero exit code, and any stderr string, the backend SHALL return a BackendEventResult with isError set to true and responseText containing the stderr content.
Validates: Requirements 8.2
Property 9: EventResult mapping preserves semantics
For any BackendEventResult and target channel ID, the mapping to the gateway's EventResult SHALL set error to responseText when isError is true (with no responseText on the gateway result), and SHALL set responseText and sessionId when isError is false (with no error on the gateway result). targetChannelId SHALL always be set.
Validates: Requirements 10.3
Property 10: Session ID storage after backend execution
For any channel ID and any BackendEventResult containing a non-undefined sessionId, after the AgentRuntime processes the result, the SessionManager SHALL contain that session ID for that channel. When sessionId is undefined, the session manager SHALL not be updated for that channel.
Validates: Requirements 10.4
Error Handling
CLI Process Errors
| Error Condition | Handling |
|---|---|
| CLI binary not found | validate() returns false at startup → gateway logs error with backend name and path, exits with code 1 |
| Non-zero exit code | Backend sets isError: true, includes stderr (truncated to 500 chars) in responseText |
| Query timeout | Backend kills process with SIGTERM after queryTimeoutMs, returns { isError: true, responseText: "Query timed out" } |
| Invalid JSON output | Backend returns { isError: true, responseText: "Failed to parse CLI output" } |
| Session corruption | AgentRuntime detects session-related error messages, removes session from SessionManager, allows retry without session |
Configuration Errors
| Error Condition | Handling |
|---|---|
Invalid AGENT_BACKEND value |
resolveBackendName throws with message listing valid options; gateway fails at startup |
Invalid BACKEND_MAX_TURNS |
Falls back to default (25), logs warning |
| Unsupported option for backend | Logs warning, ignores the option (e.g., ALLOWED_TOOLS for backends that don't support tool filtering) |
Retry Strategy
The existing withRetry mechanism in AgentRuntime continues to wrap backend execution calls:
- Max 3 retries with exponential backoff (5s base)
- Transient errors (timeout, spawn failure, crash) trigger retry
- Session corruption errors are non-retryable; session is cleared and the next attempt starts fresh
Testing Strategy
Property-Based Testing
Library: fast-check for TypeScript property-based testing.
Each property test runs a minimum of 100 iterations. Each test is tagged with a comment referencing the design property:
// Feature: multi-cli-backend, Property 1: Claude backend required flags
Properties to implement:
- Property 1–4: Generate random prompt strings, system prompts, tool lists, and config values. Call each backend's arg-building function and assert required flags are present.
- Property 5: Generate random session ID strings (including empty/undefined). For each backend, verify session flags appear only when a session ID is provided.
- Property 6: Generate random valid JSON output structures per backend format. Parse and verify extracted fields match.
- Property 7: Generate random strings. Verify resolution behavior (valid → correct BackendName, undefined → "claude", invalid → throws).
- Property 8: Generate random exit codes (non-zero) and stderr strings. Verify error result shape.
- Property 9: Generate random
BackendEventResultobjects. Verify mapping to gatewayEventResult. - Property 10: Generate random channel IDs and
BackendEventResultobjects with/without session IDs. Verify session manager state.
Unit Testing
Unit tests complement property tests for specific examples and edge cases:
- Each backend's
validate()method with mocked filesystem - Timeout behavior with a mock slow process
- Startup flow: valid config → backend created → validated → injected into runtime
- Startup flow: invalid backend name → descriptive error
- Default config values when env vars are unset
- Streaming callback invocation during output parsing
- Session corruption detection and cleanup
Integration Testing
- End-to-end test with a mock CLI script that echoes JSON in each backend's format
- Verify the full flow: config → registry → backend → execute → parse → EventResult