# Aetheel Memory System > **Date:** 2026-02-13 > **Inspired by:** OpenClaw's `src/memory/` (49 files, 2,300+ LOC manager) > **Implementation:** ~600 lines of Python across 6 modules --- ## Table of Contents 1. [Overview](#overview) 2. [Architecture](#architecture) 3. [File Structure](#file-structure) 4. [Identity Files](#identity-files) 5. [How It Works](#how-it-works) 6. [Configuration](#configuration) 7. [API Reference](#api-reference) 8. [Dependencies](#dependencies) 9. [Testing](#testing) 10. [OpenClaw Mapping](#openclaw-mapping) --- ## 1. Overview The memory system gives Aetheel **persistent, searchable memory** using a combination of markdown files and SQLite. It follows the same design as OpenClaw's memory architecture: - **Markdown IS the database** — identity files (`SOUL.md`, `USER.md`, `MEMORY.md`) are human-readable and editable in any text editor or Obsidian - **Hybrid search** — combines vector similarity (cosine, 0.7 weight) with BM25 keyword search (0.3 weight) for accurate retrieval - **Fully local** — uses fastembed ONNX embeddings (384-dim), zero API calls - **Incremental sync** — only re-indexes files that have changed (SHA-256 hash comparison) - **Session logging** — conversation transcripts stored in `daily/` and indexed for search --- ## 2. Architecture ``` ┌──────────────────────────┐ │ MemoryManager │ │ (memory/manager.py) │ ├──────────────────────────┤ │ • sync() │ │ • search() │ │ • log_session() │ │ • read/update identity │ │ • file watching │ └────────┬─────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐ │ Workspace │ │ SQLite │ │ fastembed │ │ (.md files)│ │ Database │ │ (ONNX) │ ├──────────────┤ ├─────────────┤ ├──────────────┤ │ SOUL.md │ │ files │ │ bge-small │ │ USER.md │ │ chunks │ │ 384-dim │ │ MEMORY.md │ │ chunks_fts │ │ L2-normalized│ │ memory/ │ │ emb_cache │ │ local only │ │ daily/ │ │ session_logs│ │ │ └──────────────┘ └─────────────┘ └──────────────┘ ``` ### Search Flow ``` Query: "what are my preferences?" │ ▼ ┌──────────────────┐ ┌──────────────────┐ │ Vector Search │ │ Keyword Search │ │ (cosine sim) │ │ (FTS5 / BM25) │ │ weight: 0.7 │ │ weight: 0.3 │ └────────┬─────────┘ └────────┬─────────┘ │ │ └──────────┬─────────────┘ ▼ ┌───────────────┐ │ Hybrid Merge │ │ dedupe by ID │ │ sort by score│ └───────┬───────┘ ▼ Top-N results with score ≥ min_score ``` --- ## 3. File Structure ### Source Code ``` memory/ ├── __init__.py # Package exports (MemoryManager, MemorySearchResult, MemorySource) ├── types.py # Data classes: MemoryConfig, MemorySearchResult, MemoryChunk, etc. ├── internal.py # Utilities: hashing, chunking, file discovery, cosine similarity ├── hybrid.py # Hybrid search merging (0.7 vector + 0.3 BM25) ├── schema.py # SQLite schema (files, chunks, FTS5, embedding cache) ├── embeddings.py # Local fastembed ONNX embeddings (384-dim) └── manager.py # Main MemoryManager orchestrator (~400 LOC) ``` ### Workspace (Created Automatically) ``` ~/.aetheel/workspace/ ├── SOUL.md # Personality & values — "who you are" ├── USER.md # User profile — "who I am" ├── MEMORY.md # Long-term memory — decisions, lessons, context ├── memory/ # Additional markdown memory files (optional) │ └── *.md └── daily/ # Session logs by date ├── 2026-02-13.md ├── 2026-02-14.md └── ... ``` --- ## 4. Identity Files Inspired by OpenClaw's template system (`docs/reference/templates/SOUL.md`). ### SOUL.md — Who You Are The agent's personality, values, and behavioral guidelines. Created with sensible defaults: - Core truths (be helpful, have opinions, be resourceful) - Boundaries (privacy, external actions) - Continuity rules (files ARE the memory) ### USER.md — Who I Am The user's profile — name, role, timezone, preferences, current focus, tools. Fill this in to personalize the agent. ### MEMORY.md — Long-Term Memory Persistent decisions, lessons learned, and context that carries across sessions. The agent appends entries with timestamps: ```markdown ### [2026-02-13 12:48] Learned that the user prefers concise responses with code examples. ``` --- ## 5. How It Works ### Sync (`await manager.sync()`) 1. **Discover files** — scans `SOUL.md`, `USER.md`, `MEMORY.md`, `memory/*.md` 2. **Check hashes** — compares SHA-256 content hash against stored hash in `files` table 3. **Skip unchanged** — files with matching hashes are skipped (incremental sync) 4. **Chunk** — splits changed files into overlapping text chunks (~512 tokens, 50 token overlap) 5. **Embed** — generates 384-dim vectors via fastembed (checks embedding cache first) 6. **Store** — inserts chunks + embeddings into SQLite, updates FTS5 index 7. **Clean** — removes stale entries for deleted files 8. **Sessions** — repeats for `daily/*.md` session log files ### Search (`await manager.search("query")`) 1. **Auto-sync** — triggers sync if workspace is dirty (configurable) 2. **Keyword search** — runs FTS5 `MATCH` query with BM25 ranking 3. **Vector search** — embeds query, computes cosine similarity against all chunk embeddings 4. **Hybrid merge** — combines results: `score = 0.7 × vector + 0.3 × keyword` 5. **Deduplicate** — merges chunks found by both methods (by chunk ID) 6. **Filter & rank** — removes results below `min_score`, returns top-N sorted by score ### Session Logging (`manager.log_session(content)`) 1. Creates/appends to `daily/YYYY-MM-DD.md` 2. Adds timestamped entry with channel label 3. Marks index as dirty for next sync --- ## 6. Configuration ```python from memory.types import MemoryConfig config = MemoryConfig( # Workspace directory containing identity files workspace_dir="~/.aetheel/workspace", # SQLite database path db_path="~/.aetheel/memory.db", # Chunking parameters chunk_tokens=512, # ~2048 characters per chunk chunk_overlap=50, # ~200 character overlap between chunks # Search parameters max_results=10, # maximum results per search min_score=0.1, # minimum hybrid score threshold vector_weight=0.7, # weight for vector similarity text_weight=0.3, # weight for BM25 keyword score # Embedding model (local ONNX) embedding_model="BAAI/bge-small-en-v1.5", embedding_dims=384, # Sync behavior watch=True, # enable file watching via watchdog watch_debounce_ms=2000, # debounce file change events sync_on_search=True, # auto-sync before search if dirty # Session logs directory (defaults to workspace_dir/daily/) sessions_dir=None, # Sources to index sources=["memory", "sessions"], ) ``` --- ## 7. API Reference ### `MemoryManager` ```python from memory import MemoryManager from memory.types import MemoryConfig # Create with custom config (or defaults) mgr = MemoryManager(config=MemoryConfig(...)) # Sync workspace → index stats = await mgr.sync(force=False) # Returns: {"files_found": 4, "files_indexed": 4, "chunks_created": 5, ...} # Hybrid search results = await mgr.search("what are my preferences?", max_results=5, min_score=0.1) # Returns: list[MemorySearchResult] # .path — relative file path (e.g., "USER.md") # .start_line — chunk start line # .end_line — chunk end line # .score — hybrid score (0.0 - 1.0) # .snippet — text snippet (max 700 chars) # .source — MemorySource.MEMORY or MemorySource.SESSIONS # Identity files soul = mgr.read_soul() # Read SOUL.md user = mgr.read_user() # Read USER.md memory = mgr.read_long_term_memory() # Read MEMORY.md mgr.append_to_memory("learned X") # Append timestamped entry to MEMORY.md mgr.update_identity_file("USER.md", new_content) # Overwrite a file # Session logging path = mgr.log_session("User: hi\nAssistant: hello", channel="slack") # File reading data = mgr.read_file("SOUL.md", from_line=1, num_lines=10) # Status status = mgr.status() # Returns: {"files": 5, "chunks": 5, "cached_embeddings": 4, ...} # File watching mgr.start_watching() # auto-mark dirty on workspace changes mgr.stop_watching() # Cleanup mgr.close() ``` ### `MemorySearchResult` ```python @dataclass class MemorySearchResult: path: str # Relative path to the markdown file start_line: int # First line of the matching chunk end_line: int # Last line of the matching chunk score: float # Hybrid score (0.0 - 1.0) snippet: str # Text snippet (max 700 characters) source: MemorySource # "memory" or "sessions" citation: str | None = None ``` --- ## 8. Dependencies | Package | Version | Purpose | |---------|---------|---------| | `fastembed` | 0.7.4 | Local ONNX embeddings (BAAI/bge-small-en-v1.5, 384-dim) | | `watchdog` | 6.0.0 | File system watching for auto re-indexing | | `sqlite3` | (stdlib) | Database engine with FTS5 full-text search | Added to `pyproject.toml`: ```toml dependencies = [ "fastembed>=0.7.4", "watchdog>=6.0.0", # ... existing deps ] ``` --- ## 9. Testing Run the smoke test: ```bash uv run python test_memory.py ``` ### Test Results (2026-02-13) | Test | Result | |------|--------| | `hash_text()` | ✅ SHA-256 produces 64-char hex string | | `chunk_markdown()` | ✅ Splits text into overlapping chunks with correct line numbers | | Identity file creation | ✅ SOUL.md (793 chars), USER.md (417 chars), MEMORY.md (324 chars) | | Append to MEMORY.md | ✅ Content grows with timestamped entry | | Session logging | ✅ Creates `daily/2026-02-13.md` with channel + timestamp | | Sync (first run) | ✅ 4 files found, 4 indexed, 5 chunks, 1 session | | Search "personality values" | ✅ 5 results — top: SOUL.md (score 0.595) | | Search "preferences" | ✅ 5 results — top: USER.md (score 0.583) | | FTS5 keyword search | ✅ Available | | Embedding cache | ✅ 4 entries cached (skip re-computation on next sync) | | Status report | ✅ All fields populated correctly | --- ## 10. OpenClaw Mapping How our Python implementation maps to OpenClaw's TypeScript source: | OpenClaw File | Aetheel File | Description | |---------------|-------------|-------------| | `src/memory/types.ts` | `memory/types.py` | Core types (MemorySearchResult, MemorySource, etc.) | | `src/memory/internal.ts` | `memory/internal.py` | hashText, chunkMarkdown, listMemoryFiles, cosineSimilarity | | `src/memory/hybrid.ts` | `memory/hybrid.py` | buildFtsQuery, bm25RankToScore, mergeHybridResults | | `src/memory/memory-schema.ts` | `memory/schema.py` | ensureMemoryIndexSchema → ensure_schema | | `src/memory/embeddings.ts` | `memory/embeddings.py` | createEmbeddingProvider → embed_query/embed_batch (fastembed) | | `src/memory/manager.ts` (2,300 LOC) | `memory/manager.py` (~400 LOC) | MemoryIndexManager → MemoryManager | | `src/memory/sync-memory-files.ts` | Inlined in `manager.py` | syncMemoryFiles → _run_sync | | `src/memory/session-files.ts` | Inlined in `manager.py` | buildSessionEntry → _sync_session_files | | `docs/reference/templates/SOUL.md` | Auto-created by manager | Default identity file templates | ### Key Simplifications vs. OpenClaw | Feature | OpenClaw | Aetheel | |---------|----------|---------| | **Embedding providers** | OpenAI, Voyage, Gemini, local ONNX (4 providers) | fastembed only (local ONNX, zero API calls) | | **Vector storage** | sqlite-vec extension (C library) | JSON-serialized in chunks table (pure Python) | | **File watching** | chokidar (Node.js) | watchdog (Python) | | **Batch embedding** | OpenAI/Voyage batch APIs, concurrency pools | fastembed batch (single-threaded, local) | | **Config system** | JSON5 + TypeBox + Zod schemas (100k+ LOC) | Simple Python dataclass | | **Codebase** | 49 files, 2,300+ LOC manager alone | 6 files, ~600 LOC total | ### What We Kept - ✅ Same identity file pattern (SOUL.md, USER.md, MEMORY.md) - ✅ Same hybrid search algorithm (0.7 vector + 0.3 BM25) - ✅ Same chunking approach (token-based with overlap) - ✅ Same incremental sync (hash-based change detection) - ✅ Same FTS5 full-text search with BM25 ranking - ✅ Same embedding cache (avoids re-computing unchanged chunks) - ✅ Same session log pattern (daily/ directory) --- *This memory system is Phase 1 of the Aetheel build process as outlined in `openclaw-analysis.md`.*