Files

tanmay11k 453389f55c feat: add pluggable multi-CLI backend system

Implement BackendAdapter interface with four CLI backends:
- ClaudeCodeBackend (extracted from AgentRuntime)
- CodexBackend (OpenAI Codex CLI)
- GeminiBackend (Google Gemini CLI)
- OpenCodeBackend (OpenCode CLI)

Add BackendRegistry for resolution/creation via AGENT_BACKEND env var.
Refactor AgentRuntime to delegate to BackendAdapter instead of
hardcoding Claude CLI. Update GatewayConfig with new env vars
(AGENT_BACKEND, BACKEND_CLI_PATH, BACKEND_MODEL, BACKEND_MAX_TURNS).

Includes 10 property-based test files and unit tests for edge cases.

2026-02-22 23:41:30 -05:00

15 KiB

Raw Permalink Blame History

Design Document: Multi-CLI Backend

Overview

This design introduces a pluggable CLI backend system for the Aetheel gateway. The current architecture hardcodes Claude Code CLI invocation directly inside AgentRuntime. We will extract a BackendAdapter interface and provide four implementations (Claude, Codex, Gemini, OpenCode), each encapsulating CLI spawning, argument construction, output parsing, and session management. A BackendRegistry resolves the active backend from environment configuration at startup, validates it, and injects it into AgentRuntime.

The key design goals are:

Zero behavioral change for existing Claude deployments (backward compatible defaults)
Each backend is a self-contained module with no cross-dependencies
The rest of the gateway (event processing, Discord integration, session management) remains untouched
Output is normalized into a single EventResult shape regardless of backend

Architecture

graph TD
    A[Discord Bot] --> B[EventQueue]
    B --> C[AgentRuntime]
    C --> D[BackendAdapter Interface]
    D --> E[ClaudeCodeBackend]
    D --> F[CodexBackend]
    D --> G[GeminiBackend]
    D --> H[OpenCodeBackend]
    I[BackendRegistry] -->|resolves active backend| D
    J[GatewayConfig] -->|AGENT_BACKEND env| I
    I -->|validates at startup| D

Startup Flow

sequenceDiagram
    participant Main
    participant Config as GatewayConfig
    participant Registry as BackendRegistry
    participant Backend as BackendAdapter
    participant Runtime as AgentRuntime

    Main->>Config: loadConfig()
    Config-->>Main: config (includes agentBackend, backendCliPath)
    Main->>Registry: createBackend(config)
    Registry-->>Main: BackendAdapter instance
    Main->>Backend: validate()
    alt validation fails
        Main->>Main: log error, exit(1)
    end
    Main->>Runtime: new AgentRuntime(config, backend, ...)

Execution Flow

sequenceDiagram
    participant Runtime as AgentRuntime
    participant Backend as BackendAdapter
    participant CLI as CLI Process

    Runtime->>Backend: execute(prompt, systemPrompt, sessionId?, onStream?)
    Backend->>CLI: spawn with backend-specific args
    CLI-->>Backend: stdout (JSON events)
    Backend->>Backend: parse output into EventResult
    Backend-->>Runtime: EventResult { responseText, sessionId, isError }

Components and Interfaces

BackendAdapter Interface

export interface BackendAdapterConfig {
  cliPath: string;
  workingDir: string;
  queryTimeoutMs: number;
  allowedTools: string[];
  maxTurns: number;
  model?: string;
}

export interface EventResult {
  responseText?: string;
  sessionId?: string;
  isError: boolean;
}

export type StreamCallback = (text: string) => Promise<void>;

export interface BackendAdapter {
  /** Unique identifier for this backend (e.g., "claude", "codex") */
  name(): string;

  /** Execute a prompt and return normalized results */
  execute(
    prompt: string,
    systemPrompt: string,
    sessionId?: string,
    onStream?: StreamCallback,
  ): Promise<EventResult>;

  /** Validate that the CLI binary is reachable and executable */
  validate(): Promise<boolean>;
}

ClaudeCodeBackend

Preserves the existing behavior extracted from AgentRuntime.runClaude().

Writes system prompt to a temp file, passes via --append-system-prompt-file
Spawns: claude -p <prompt> --output-format json --dangerously-skip-permissions --append-system-prompt-file <file>
Session resume: --resume <sessionId>
Tool filtering: --allowedTools <tool> for each tool
Max turns: --max-turns <n>
Parses JSON array output for system/init (session_id) and result objects

CodexBackend

Spawns: codex exec <prompt> --json --dangerously-bypass-approvals-and-sandbox
Working directory: --cd <path>
Session resume: codex exec resume <sessionId> with follow-up prompt
Parses newline-delimited JSON events for the final assistant message
System prompt: passed via --config system_prompt=<text> or prepended to prompt

GeminiBackend

Spawns: gemini <prompt> --output-format json --approval-mode yolo
Session resume: --resume <sessionId>
Parses JSON output for response text
System prompt: prepended to prompt text (Gemini CLI has no system prompt file flag in non-interactive mode)

OpenCodeBackend

Spawns: opencode run <prompt> --format json
Session resume: --session <sessionId> --continue
Model selection: --model <provider/model>
Parses JSON events for final response text
System prompt: prepended to prompt text

BackendRegistry

export type BackendName = "claude" | "codex" | "gemini" | "opencode";

export function createBackend(
  name: BackendName,
  config: BackendAdapterConfig,
): BackendAdapter;

export function resolveBackendName(raw: string | undefined): BackendName;

resolveBackendName maps the AGENT_BACKEND env var to a valid BackendName, defaulting to "claude", or throws with a descriptive error listing valid options
createBackend instantiates the correct implementation

AgentRuntime Refactoring

The constructor changes from:

constructor(config, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)

to:

constructor(config, backend, sessionManager, markdownConfigLoader, systemPromptAssembler, hookManager)

executeClaude() and runClaude() are replaced by this.backend.execute()
The ClaudeJsonResponse interface is removed from AgentRuntime
EventResult mapping: the backend's EventResult maps directly to the gateway's existing EventResult interface (adding targetChannelId in the runtime layer)

GatewayConfig Changes

export interface GatewayConfig {
  // ... existing fields ...
  agentBackend: BackendName;      // NEW: replaces implicit claude-only
  backendCliPath: string;          // NEW: replaces claudeCliPath
  backendModel?: string;           // NEW: optional model override
  backendMaxTurns: number;         // NEW: configurable max turns
  // claudeCliPath removed
}

New environment variables:

AGENT_BACKEND → agentBackend (default: "claude")
BACKEND_CLI_PATH → backendCliPath (default: backend-specific, e.g., "claude", "codex", "gemini", "opencode")
BACKEND_MODEL → backendModel
BACKEND_MAX_TURNS → backendMaxTurns (default: 25)

Data Models

EventResult (Backend)

export interface BackendEventResult {
  responseText?: string;
  sessionId?: string;
  isError: boolean;
}

This is the normalized output from any backend. The AgentRuntime maps it to the gateway's EventResult:

// Gateway EventResult (existing, unchanged)
export interface EventResult {
  responseText?: string;
  targetChannelId?: string;
  sessionId?: string;
  error?: string;
}

Mapping logic:

if (backendResult.isError) {
  return { error: backendResult.responseText, targetChannelId };
} else {
  return { responseText: backendResult.responseText, targetChannelId, sessionId: backendResult.sessionId };
}

BackendAdapterConfig

export interface BackendAdapterConfig {
  cliPath: string;          // Path to CLI binary
  workingDir: string;       // Working directory for CLI process
  queryTimeoutMs: number;   // Timeout before killing the process
  allowedTools: string[];   // Tools to whitelist (backend-specific support)
  maxTurns: number;         // Max agentic turns
  model?: string;           // Optional model override
}

CLI Output Formats

Backend	Output Format	Session ID Source	Result Source
Claude	JSON array	`system/init` object `.session_id`	`result` object `.result`
Codex	Newline-delimited JSON	Session ID from exec metadata	Final assistant message content
Gemini	JSON object	Session metadata in output	Response text field
OpenCode	JSON events	Session field in response	Final response text

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Claude backend required flags

For any prompt string, system prompt string, and allowed tools list, the Claude backend's generated argument list SHALL always contain -p, --output-format json, --dangerously-skip-permissions, --append-system-prompt-file, --max-turns, and one --allowedTools entry per configured tool.

Validates: Requirements 2.2, 2.5, 2.6

Property 2: Codex backend required flags

For any prompt string and working directory, the Codex backend's generated argument list SHALL always contain the exec subcommand, --json, --dangerously-bypass-approvals-and-sandbox, and --cd <workingDir>.

Validates: Requirements 3.2, 3.3, 3.4, 3.5

Property 3: Gemini backend required flags

For any prompt string, the Gemini backend's generated argument list SHALL always contain the prompt as a positional argument, --output-format json, and --approval-mode yolo.

Validates: Requirements 4.2, 4.3, 4.4

Property 4: OpenCode backend required flags

For any prompt string and optional model string, the OpenCode backend's generated argument list SHALL always contain the run subcommand, --format json, and when a model is configured, --model <model>.

Validates: Requirements 5.2, 5.3, 5.5

Property 5: Session resume args across backends

For any backend and any non-empty session ID string, the generated argument list SHALL include the backend-specific session resume flags: --resume <id> for Claude, resume <id> subcommand for Codex, --resume <id> for Gemini, and --session <id> --continue for OpenCode. When no session ID is provided, no session-related flags SHALL appear.

Validates: Requirements 2.3, 3.7, 4.5, 5.4

Property 6: Output parsing extracts correct fields

For any valid backend-specific JSON output containing a response text and session ID, the backend's parser SHALL produce a BackendEventResult where responseText matches the expected response content and sessionId matches the expected session identifier.

Validates: Requirements 2.4, 3.6, 4.6, 5.6, 8.1

Property 7: Backend name resolution

For any string, resolveBackendName SHALL return the corresponding BackendName if the string is one of "claude", "codex", "gemini", or "opencode", SHALL return "claude" when the input is undefined, and SHALL throw a descriptive error for any other string value.

Validates: Requirements 6.1, 6.2, 6.3, 6.5

Property 8: Non-zero exit code produces error result

For any backend, any non-zero exit code, and any stderr string, the backend SHALL return a BackendEventResult with isError set to true and responseText containing the stderr content.

Validates: Requirements 8.2

Property 9: EventResult mapping preserves semantics

For any BackendEventResult and target channel ID, the mapping to the gateway's EventResult SHALL set error to responseText when isError is true (with no responseText on the gateway result), and SHALL set responseText and sessionId when isError is false (with no error on the gateway result). targetChannelId SHALL always be set.

Validates: Requirements 10.3

Property 10: Session ID storage after backend execution

For any channel ID and any BackendEventResult containing a non-undefined sessionId, after the AgentRuntime processes the result, the SessionManager SHALL contain that session ID for that channel. When sessionId is undefined, the session manager SHALL not be updated for that channel.

Validates: Requirements 10.4

Error Handling

CLI Process Errors

Error Condition	Handling
CLI binary not found	`validate()` returns false at startup → gateway logs error with backend name and path, exits with code 1
Non-zero exit code	Backend sets `isError: true`, includes stderr (truncated to 500 chars) in `responseText`
Query timeout	Backend kills process with SIGTERM after `queryTimeoutMs`, returns `{ isError: true, responseText: "Query timed out" }`
Invalid JSON output	Backend returns `{ isError: true, responseText: "Failed to parse CLI output" }`
Session corruption	`AgentRuntime` detects session-related error messages, removes session from `SessionManager`, allows retry without session

Configuration Errors

Error Condition	Handling
Invalid `AGENT_BACKEND` value	`resolveBackendName` throws with message listing valid options; gateway fails at startup
Invalid `BACKEND_MAX_TURNS`	Falls back to default (25), logs warning
Unsupported option for backend	Logs warning, ignores the option (e.g., `ALLOWED_TOOLS` for backends that don't support tool filtering)

Retry Strategy

The existing withRetry mechanism in AgentRuntime continues to wrap backend execution calls:

Max 3 retries with exponential backoff (5s base)
Transient errors (timeout, spawn failure, crash) trigger retry
Session corruption errors are non-retryable; session is cleared and the next attempt starts fresh

Testing Strategy

Property-Based Testing

Library: fast-check for TypeScript property-based testing.

Each property test runs a minimum of 100 iterations. Each test is tagged with a comment referencing the design property:

// Feature: multi-cli-backend, Property 1: Claude backend required flags

Properties to implement:

Property 1–4: Generate random prompt strings, system prompts, tool lists, and config values. Call each backend's arg-building function and assert required flags are present.
Property 5: Generate random session ID strings (including empty/undefined). For each backend, verify session flags appear only when a session ID is provided.
Property 6: Generate random valid JSON output structures per backend format. Parse and verify extracted fields match.
Property 7: Generate random strings. Verify resolution behavior (valid → correct BackendName, undefined → "claude", invalid → throws).
Property 8: Generate random exit codes (non-zero) and stderr strings. Verify error result shape.
Property 9: Generate random BackendEventResult objects. Verify mapping to gateway EventResult.
Property 10: Generate random channel IDs and BackendEventResult objects with/without session IDs. Verify session manager state.

Unit Testing

Unit tests complement property tests for specific examples and edge cases:

Each backend's validate() method with mocked filesystem
Timeout behavior with a mock slow process
Startup flow: valid config → backend created → validated → injected into runtime
Startup flow: invalid backend name → descriptive error
Default config values when env vars are unset
Streaming callback invocation during output parsing
Session corruption detection and cleanup

Integration Testing

End-to-end test with a mock CLI script that echoes JSON in each backend's format
Verify the full flow: config → registry → backend → execute → parse → EventResult

15 KiB Raw Permalink Blame History Unescape Escape