update documentation for sdk, vault, gitea integration

2026-03-22 15:00:51 +00:00
parent 7bf9e25361
commit cccdb7b502
4 changed files with 868 additions and 49 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,71 @@
+# Sol — Developer Context
+
+## Build & Test
+
+```sh
+cargo build --release                              # debug: cargo build
+cargo test                                         # 102 unit tests, no external services needed
+cargo build --release --target x86_64-unknown-linux-gnu  # cross-compile for production
+```
+
+Docker (multi-stage, vendored deps):
+
+```sh
+docker build -t sol .
+```
+
+Production build + deploy via sunbeam CLI:
+
+```sh
+sunbeam build sol --push --deploy
+```
+
+## Private Cargo Registry
+
+`mistralai-client` is published to Sunbeam's private Gitea cargo registry at `src.sunbeam.pt`. The crate is vendored into `vendor/` for Docker builds. Registry config lives in `.cargo/config.toml` (not checked in — generated by Dockerfile and locally via `cargo vendor vendor/`).
+
+## Vendor Workflow
+
+After adding/updating deps:
+
+```sh
+cargo vendor vendor/
+```
+
+This updates the vendored sources. Commit `vendor/` changes alongside `Cargo.lock`.
+
+## Key Architecture Notes
+
+- **`chat_blocking()` workaround**: The Mistral client's `chat_async` holds a `std::sync::MutexGuard` across `.await`, making the future `!Send`. All chat calls use `chat_blocking()` which runs `client.chat()` via `tokio::task::spawn_blocking`.
+- **Two response paths**: Controlled by `agents.use_conversations_api` config toggle.
+  - Legacy: manual `Vec<ChatMessage>` assembly, chat completions, tool iteration loop.
+  - Conversations API: `ConversationRegistry` with persistent state (SQLite-backed), agents, function call loop.
+- **deno_core sandbox**: `run_script` tool spins up a fresh V8 isolate per invocation with `sol.*` host API bindings. Timeout via V8 isolate termination. Output truncated to 4096 chars.
+
+## K8s Context
+
+- Namespace: `matrix`
+- Deployment: `sol` (Recreate strategy — single replica, SQLite can't share)
+- PVC: `sol-data` (1Gi RWO) mounted at `/data` — holds `sol.db` + `matrix-state/`
+- ConfigMap: `sol-config` — `sol.toml` + `system_prompt.md` (subPath mounts)
+- Secrets: `sol-secrets` via VaultStaticSecret from OpenBao `secret/sol` (5 keys: `matrix-access-token`, `matrix-device-id`, `mistral-api-key`, `gitea-admin-username`, `gitea-admin-password`)
+- Vault auth: Sol authenticates to OpenBao via K8s auth (role `sol-agent`, policy `sol-agent`) for storing user impersonation tokens at `secret/sol-tokens/{localpart}/{service}`
+- Build target: `x86_64-unknown-linux-gnu` (Scaleway amd64 server)
+- Base image: `gcr.io/distroless/cc-debian12:nonroot`
+
+## Source Layout
+
+- `src/main.rs` — startup, component wiring, backfill, agent recreation + sneeze
+- `src/sync.rs` — Matrix event handlers, context hint injection for new conversations
+- `src/config.rs` — TOML config with serde defaults (6 sections: matrix, opensearch, mistral, behavior, agents, services, vault)
+- `src/context.rs` — `ResponseContext`, `derive_user_id`, `localpart`
+- `src/conversations.rs` — `ConversationRegistry` (room→conversation mapping, SQLite-backed, reset_all)
+- `src/persistence.rs` — SQLite store (WAL mode, 3 tables: `conversations`, `agents`, `service_users`)
+- `src/agent_ux.rs` — `AgentProgress` (reaction lifecycle + thread posting)
+- `src/matrix_utils.rs` — message extraction, image download, reactions
+- `src/brain/` — evaluator (full system prompt context), responder (per-message context headers + memory), personality, conversation manager
+- `src/agents/` — registry (instructions hash + automatic recreation), definitions (dynamic delegation)
+- `src/sdk/` — vault client (K8s auth), token store (Vault-backed), gitea client (PAT auto-provisioning)
+- `src/memory/` — schema, store, extractor
+- `src/tools/` — registry (12 tools), search, room_history, room_info, script, devtools (gitea), bridge
+- `src/archive/` — schema, indexer
--- a/README.md
+++ b/README.md
@@ -1,68 +1,403 @@
 # sol

-a virtual librarian for Matrix. sol lives in your chat rooms, archives conversations in OpenSearch, and responds with the help of Mistral AI — with end-to-end encryption, tool use, and per-user memory.
+a virtual librarian for Matrix. sol lives in your chat rooms, archives conversations in OpenSearch, and responds with the help of Mistral AI — with end-to-end encryption, tool use, per-user memory, and a multi-agent architecture.

-sol is built by [sunbeam studios](https://sunbeam.pt) as part of our self-hosted collaboration stack.
+sol is built by [sunbeam studios](https://sunbeam.pt) as part of our self-hosted collaboration stack for a three-person game studio.

 ## what sol does

 - **Matrix presence** — joins rooms, reads the vibe, decides when to speak. direct messages always get a response; in group rooms, sol evaluates relevance before jumping in.
 - **message archive** — every message is indexed in OpenSearch with full-text and semantic search. sol can search its own archive via tools.
- **tool use** — mistral calls tools mid-conversation: archive search, room context retrieval, and a sandboxed TypeScript/JavaScript runtime (deno_core) for computation.
- **per-user memory** — sol remembers things about the people it talks to. memories are extracted automatically after conversations (via ministral-3b), injected into the system prompt before responding, and accessible from scripts via `sol.memory.get/set`. user isolation is enforced at the rust level.
+- **tool use** — Mistral calls tools mid-conversation: archive search, room context retrieval, room info, and a sandboxed TypeScript/JavaScript runtime (deno_core) for computation.
+- **per-user memory** — sol remembers things about the people it talks to. memories are extracted automatically after conversations, injected into the system prompt before responding, and accessible from scripts via `sol.memory.get/set`.
+- **user impersonation** — sol acts on behalf of users when calling external services. PATs are auto-provisioned via admin APIs and stored securely in OpenBao (Vault). OIDC-to-service username mappings handle identity mismatches.
+- **gitea integration** — first domain agent (sol-devtools): list repos, search issues, create issues, list PRs, get file contents — all as the requesting user.
+- **multi-agent architecture** — an orchestrator agent with personality + tools + web search. domain agent delegation is dynamic — only active agents appear in instructions. agent state persisted in SQLite with instructions hash for automatic recreation on prompt changes.
+- **conversations API** — persistent conversation state per room via Mistral's Conversations API, with automatic compaction at token thresholds. per-message context headers inject timestamps, room info, and memory notes.
+- **multimodal** — m.image messages are downloaded from Matrix via mxc://, converted to base64 data URIs, and sent as `ContentPart::ImageUrl` to Mistral vision models.
 - **reactions** — sol can react to messages with emoji when it has something to express but not enough to say.
 - **E2EE** — full end-to-end encryption via matrix-sdk with sqlite state store.

 ## architecture

+```mermaid
+flowchart TD
+    subgraph Matrix
+        sync[Matrix Sync Loop]
+    end
+
+    subgraph Engagement
+        eval[Evaluator]
+        rules[Rule Checks]
+        llm_eval[LLM Evaluation<br/>ministral-3b]
+    end
+
+    subgraph Response
+        legacy[Legacy Path<br/>manual messages + chat completions]
+        convapi[Conversations API Path<br/>ConversationRegistry + agents]
+        tools[Tool Execution]
+    end
+
+    subgraph Persistence
+        sqlite[(SQLite<br/>conversations + agents)]
+        opensearch[(OpenSearch<br/>archive + memory)]
+    end
+
+    sync --> |message event| eval
+    eval --> rules
+    rules --> |MustRespond| Response
+    rules --> |no rule match| llm_eval
+    llm_eval --> |MaybeRespond| Response
+    llm_eval --> |React| sync
+    llm_eval --> |Ignore| sync
+    legacy --> tools
+    convapi --> tools
+    tools --> opensearch
+    legacy --> |response text| sync
+    convapi --> |response text| sync
+    sync --> |archive| opensearch
+    convapi --> sqlite
+    sync --> |memory extraction| opensearch
+```
+
+## source tree
+
 ```
 src/
-├── main.rs              entrypoint, Matrix client setup, backfill
-├── sync.rs              event loop — messages, reactions, redactions, invites
-├── config.rs            TOML config with serde defaults
-├── context.rs           ResponseContext — per-message sender identity
-├── matrix_utils.rs      message extraction, reply detection, room info
+├── main.rs                 entrypoint, Matrix client setup, backfill, orchestrator init
+├── sync.rs                 event loop — messages, reactions, redactions, invites
+├── config.rs               TOML config (5 sections) with serde defaults
+├── context.rs              ResponseContext — per-message sender identity threading
+├── conversations.rs        ConversationRegistry — room→conversation mapping, SQLite-backed
+├── persistence.rs          SQLite store (WAL mode, 2 tables: conversations, agents)
+├── agent_ux.rs             AgentProgress — reaction lifecycle (🔍→⚙️→✅) + thread posting
+├── matrix_utils.rs         message extraction, reply/edit/thread detection, image download
 ├── archive/
-│   ├── schema.rs        ArchiveDocument, OpenSearch index mapping
-│   └── indexer.rs       batched indexing, reactions, edits, redactions
+│   ├── schema.rs           ArchiveDocument, OpenSearch index mapping
+│   └── indexer.rs          batched indexing, reactions, edits, redactions
 ├── brain/
-│   ├── conversation.rs  sliding-window context per room
-│   ├── evaluator.rs     engagement decision (must/maybe/react/ignore)
-│   ├── personality.rs   system prompt templating
-│   └── responder.rs     Mistral chat loop with tool iterations + memory
+│   ├── conversation.rs     sliding-window context per room (configurable group/DM windows)
+│   ├── evaluator.rs        engagement decision (MustRespond/MaybeRespond/React/Ignore)
+│   ├── personality.rs      system prompt templating ({date}, {room_name}, {members}, etc.)
+│   └── responder.rs        both response paths, tool iteration loops, memory loading
 ├── memory/
-│   ├── schema.rs        MemoryDocument, index mapping
-│   ├── store.rs         query, get_recent, set — OpenSearch operations
-│   └── extractor.rs     post-response fact extraction via ministral-3b
+│   ├── schema.rs           MemoryDocument, index mapping
+│   ├── store.rs            query (topical), get_recent, set — OpenSearch operations
+│   └── extractor.rs        post-response fact extraction via ministral-3b
+├── agents/
+│   ├── definitions.rs      orchestrator config + 8 domain agent definitions (dynamic delegation)
+│   └── registry.rs         agent lifecycle with instructions hash staleness detection
+├── sdk/
+│   ├── mod.rs              SDK module root
+│   ├── vault.rs            OpenBao/Vault client (K8s auth, KV v2 read/write/delete)
+│   ├── tokens.rs           TokenStore — Vault-backed secrets + SQLite username mappings
+│   └── gitea.rs            GiteaClient — typed Gitea API v1 with PAT auto-provisioning
 └── tools/
-    ├── mod.rs           ToolRegistry, tool definitions, dispatch
-    ├── search.rs        archive search (keyword + semantic)
-    ├── room_history.rs  context around a timestamp or event
-    ├── room_info.rs     room listing, member queries
-    └── script.rs        deno_core sandbox with sol.* API
+    ├── mod.rs              ToolRegistry — 12 tool definitions + dispatch (5 core + 7 gitea)
+    ├── search.rs           archive search (keyword + semantic via embedding pipeline)
+    ├── room_history.rs     context around a timestamp or event
+    ├── room_info.rs        room listing, member queries
+    ├── script.rs           deno_core sandbox with sol.* host API, TS transpilation
+    ├── devtools.rs         Gitea tool handlers (repos, issues, PRs, files)
+    └── bridge.rs           ToolBridge — generic async handler map for future SDK integration
 ```

+
+## engagement pipeline
+
+```mermaid
+sequenceDiagram
+    participant M as Matrix
+    participant S as Sync Handler
+    participant E as Evaluator
+    participant LLM as ministral-3b
+    participant R as Responder
+
+    M->>S: m.room.message
+    S->>S: archive message
+    S->>S: update conversation context
+    S->>E: evaluate(sender, body, is_dm, recent)
+
+    alt own message
+        E-->>S: Ignore
+    else @mention or matrix.to link
+        E-->>S: MustRespond (DirectMention)
+    else DM
+        E-->>S: MustRespond (DirectMessage)
+    else "sol" or "hey sol"
+        E-->>S: MustRespond (NameInvocation)
+    else no rule match
+        E->>LLM: relevance evaluation (JSON)
+        LLM-->>E: {relevance, hook, emoji}
+        alt relevance >= spontaneous_threshold (0.85)
+            E-->>S: MaybeRespond
+        else relevance >= reaction_threshold (0.6) + emoji
+            E-->>S: React (emoji)
+        else below thresholds
+            E-->>S: Ignore
+        end
+    end
+
+    alt MustRespond or MaybeRespond
+        S->>S: check in-flight guard
+        S->>S: check cooldown (15s default)
+        S->>R: generate response
+    end
+```
+
+## response generation
+
+Sol has two response paths, controlled by `agents.use_conversations_api`:
+
+### legacy path (`generate_response`)
+
+1. Apply response delay (random within configured range)
+2. Send typing indicator
+3. Load memory notes (topical query + recent backfill, max 5)
+4. Build system prompt via `Personality` (template substitution: `{date}`, `{room_name}`, `{members}`, `{memory_notes}`, `{room_context_rules}`, `{epoch_ms}`)
+5. Assemble message array: system → context messages (with timestamps) → trigger (multimodal if image)
+6. Tool iteration loop (up to `max_tool_iterations`, default 5):
+   - If `finish_reason == ToolCalls`: execute tools, append results, continue
+   - If text response: strip "sol:" prefix, return
+7. Fire-and-forget memory extraction
+
+### conversations API path (`generate_response_conversations`)
+
+1. Apply response delay
+2. Send typing indicator
+3. Format input: raw text for DMs, `<@user:server> text` for groups
+4. Send through `ConversationRegistry.send_message()` (creates or appends to Mistral conversation)
+5. Function call loop (up to `max_tool_iterations`):
+   - Execute tool calls locally via `ToolRegistry`
+   - Send `FunctionResultEntry` back to conversation
+6. Extract assistant text, strip prefix, return
+
+## tool system
+
+| Tool | Parameters | Description |
+|------|-----------|-------------|
+| `search_archive` | `query` (required), `room`, `sender`, `after`, `before`, `limit`, `semantic` | Search the message archive (keyword or semantic) |
+| `get_room_context` | `room_id` (required), `around_timestamp`, `around_event_id`, `before_count`, `after_count` | Get messages around a point in time or event |
+| `list_rooms` | *(none)* | List all rooms Sol is in with names and member counts |
+| `get_room_members` | `room_id` (required) | Get members of a specific room |
+| `run_script` | `code` (required) | Execute TypeScript/JavaScript in a sandboxed deno_core runtime |
+
+### run_script sandbox
+
+The script runtime is a fresh V8 isolate per invocation with:
+
+- **TypeScript support** — code is transpiled via `deno_ast` before execution
+- **Timeout** — configurable via `behavior.script_timeout_secs` (default 5s), enforced by V8 isolate termination
+- **Heap limit** — configurable via `behavior.script_max_heap_mb` (default 64MB)
+- **Output** — `console.log()` + last expression value, truncated to 4096 characters
+- **Temp filesystem** — sandboxed `sol.fs.read/write/list` with path traversal protection
+- **Network** — `sol.fetch(url)` restricted to `behavior.script_fetch_allowlist` domains
+
+Host API (`sol.*`):
+
+```typescript
+sol.search(query, opts?)         // search message archive
+sol.rooms()                      // list joined rooms → [{name, id, members}]
+sol.members(roomName)            // get room members → [{name, id}]
+sol.fetch(url)                   // HTTP GET (allowlisted domains only)
+sol.memory.get(query?)           // retrieve memories relevant to query
+sol.memory.set(content, category?) // save a memory note
+sol.fs.read(path)                // read file from sandbox
+sol.fs.write(path, content)      // write file to sandbox
+sol.fs.list(path?)               // list sandbox directory
+```
+
+All `sol.*` methods are async — use `await`.
+
+## memory system
+
+### extraction (post-response, fire-and-forget)
+
+After each response, a background task sends the exchange to `ministral-3b` with a structured extraction prompt. The model returns `{"memories": [{"content": "...", "category": "preference|fact|context"}]}`. Categories are normalized via `normalize_category()` — valid categories are `preference`, `fact`, `context`; anything else falls back to `general`.
+
+### storage (OpenSearch)
+
+Each memory is a `MemoryDocument` with: `id`, `user_id`, `content`, `category`, `created_at`, `updated_at`, `source` (`"auto"` or `"script"`). The index name defaults to `sol_user_memory`. User isolation is enforced at the Rust level via `user_id` filtering on all queries.
+
+### pre-response loading
+
+Before generating a response, the responder loads up to 5 memories:
+
+1. **Topical query** — semantic search against the trigger message
+2. **Recent backfill** — if fewer than 3 topical results, fill remaining slots with most recent memories
+
+Memory notes are injected into the system prompt as a `## notes about {display_name}` block with instructions to use them naturally without mentioning their existence.
+
+## archive
+
+Every message event is archived as an `ArchiveDocument` in OpenSearch:
+
+- **Batch indexing** — messages are buffered and flushed periodically (`opensearch.batch_size` default 50, `opensearch.flush_interval_ms` default 2000)
+- **Embedding pipeline** — configurable via `opensearch.embedding_pipeline` for semantic search
+- **Edit tracking** — `m.replace` events update the original document's content
+- **Redaction** — `m.room.redaction` sets `redacted: true` on the original
+- **Reactions** — `m.reaction` events append `{sender, emoji, timestamp}` to the document's reactions array
+- **Backfill** — on startup, conversation context is backfilled from the archive; reactions are backfilled from Matrix room timelines (last 500 events per room)
+
+## agent architecture
+
+```mermaid
+stateDiagram-v2
+    [*] --> CheckMemory: startup
+    CheckMemory --> CheckServer: agent_id in SQLite?
+    CheckMemory --> SearchByName: not in SQLite
+
+    CheckServer --> Ready: exists on Mistral server
+    CheckServer --> SearchByName: gone from server
+
+    SearchByName --> Ready: found by name
+    SearchByName --> Create: not found
+
+    Create --> Ready: agent created
+    Ready --> [*]
+```
+
+### orchestrator
+
+The orchestrator agent carries Sol's full personality (system prompt) plus all 5 tool definitions converted to `AgentTool` format. It's created on startup if `agents.use_conversations_api` is enabled. Temperature: 0.5.
+
+### domain agents (8 definitions)
+
+| Agent | Domain |
+|-------|--------|
+| `sol-observability` | Metrics, logs, dashboards, alerts (Prometheus, Loki, Grafana) |
+| `sol-data` | Full-text search, object storage (OpenSearch, SeaweedFS) |
+| `sol-devtools` | Git repos, issues, PRs, kanban boards (Gitea, Planka) |
+| `sol-infrastructure` | Kubernetes, deployments, certificates, builds |
+| `sol-identity` | User accounts, sessions, OAuth2 (Kratos, Hydra) |
+| `sol-collaboration` | Contacts, documents, meetings, files, email, calendars (La Suite) |
+| `sol-communication` | Chat rooms, messages, members (Matrix) |
+| `sol-media` | Video/audio rooms, recordings, streams (LiveKit) |
+
+Domain agents are defined in `agents/definitions.rs` as `DOMAIN_AGENTS` (name, description, instructions). Temperature: 0.3.
+
+### ToolBridge
+
+`tools/bridge.rs` provides a generic async handler map (`ToolBridge`) for mapping Mistral tool call names to handler functions. This is scaffolding for future SDK-based tool integration where domain agents will have their own tool sets.
+
+## persistence
+
+SQLite database at `/data/sol.db` (configurable via `matrix.db_path`), WAL mode.
+
+### tables
+
+**conversations** — room_id (PK), conversation_id, estimated_tokens, created_at
+
+**agents** — name (PK), agent_id, model, created_at
+
+### recovery behavior
+
+On startup, if the database fails to open:
+
+1. Log error
+2. Fall back to in-memory SQLite (conversations won't survive restarts)
+3. After sync loop starts, send `*sneezes*` to all joined rooms to signal the hiccup
+
+## multimodal
+
+When an `m.image` message arrives:
+
+1. Extract media source from event (`MessageType::Image`)
+2. Download bytes from Matrix media API via `matrix_sdk::media::get_media_content`
+3. Base64-encode as `data:{mime};base64,{data}` URI
+4. Pass to Mistral as `ContentPart::ImageUrl` alongside any text caption
+
+Encrypted images are not supported (the `MediaSource::Encrypted` variant is skipped).
+
+## configuration reference
+
+Config is loaded from `SOL_CONFIG` (default: `/etc/sol/sol.toml`).
+
+### `[matrix]`
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `homeserver_url` | string | *required* | Matrix homeserver URL |
+| `user_id` | string | *required* | Bot's Matrix user ID |
+| `state_store_path` | string | *required* | Path for Matrix SDK sqlite state |
+| `db_path` | string | `/data/sol.db` | SQLite database for persistent state |
+
+### `[opensearch]`
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `url` | string | *required* | OpenSearch cluster URL |
+| `index` | string | *required* | Archive index name |
+| `batch_size` | usize | `50` | Messages per flush batch |
+| `flush_interval_ms` | u64 | `2000` | Flush interval in milliseconds |
+| `embedding_pipeline` | string | `tuwunel_embedding_pipeline` | Ingest pipeline for semantic embeddings |
+| `memory_index` | string | `sol_user_memory` | Memory index name |
+
+### `[mistral]`
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `default_model` | string | `mistral-medium-latest` | Model for response generation |
+| `evaluation_model` | string | `ministral-3b-latest` | Model for engagement evaluation + memory extraction |
+| `research_model` | string | `mistral-large-latest` | Model for research tasks |
+| `max_tool_iterations` | usize | `5` | Max tool call rounds per response |
+
+### `[behavior]`
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `response_delay_min_ms` | u64 | `100` | Min delay before direct response |
+| `response_delay_max_ms` | u64 | `2300` | Max delay before direct response |
+| `spontaneous_delay_min_ms` | u64 | `15000` | Min delay before spontaneous response |
+| `spontaneous_delay_max_ms` | u64 | `60000` | Max delay before spontaneous response |
+| `spontaneous_threshold` | f32 | `0.85` | LLM relevance score to trigger spontaneous response |
+| `reaction_threshold` | f32 | `0.6` | LLM relevance score to trigger emoji reaction |
+| `reaction_enabled` | bool | `true` | Enable emoji reactions |
+| `room_context_window` | usize | `30` | Messages to keep in group room context |
+| `dm_context_window` | usize | `100` | Messages to keep in DM context |
+| `backfill_on_join` | bool | `true` | Backfill context from archive on startup |
+| `backfill_limit` | usize | `10000` | Max messages to backfill |
+| `instant_responses` | bool | `false` | Skip response delays (for testing) |
+| `cooldown_after_response_ms` | u64 | `15000` | Cooldown before another spontaneous response |
+| `evaluation_context_window` | usize | `25` | Recent messages sent to evaluation LLM |
+| `detect_sol_in_conversation` | bool | `true` | Use active/passive evaluation prompts based on Sol's participation |
+| `evaluation_prompt_active` | string? | *(built-in)* | Custom prompt when Sol is in conversation |
+| `evaluation_prompt_passive` | string? | *(built-in)* | Custom prompt when Sol hasn't spoken |
+| `script_timeout_secs` | u64 | `5` | Script execution timeout |
+| `script_max_heap_mb` | usize | `64` | V8 heap limit for scripts |
+| `script_fetch_allowlist` | string[] | `[]` | Domains allowed for `sol.fetch()` |
+| `memory_extraction_enabled` | bool | `true` | Enable post-response memory extraction |
+
+### `[agents]`
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `orchestrator_model` | string | `mistral-medium-latest` | Model for orchestrator agent |
+| `domain_model` | string | `mistral-medium-latest` | Model for domain agents |
+| `compaction_threshold` | u32 | `118000` | Token estimate before conversation reset (~90% of 131K context) |
+| `use_conversations_api` | bool | `false` | Enable Conversations API path (vs legacy chat completions) |
+
+## environment variables
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `SOL_MATRIX_ACCESS_TOKEN` | yes | Matrix access token |
+| `SOL_MATRIX_DEVICE_ID` | yes | Matrix device ID (for E2EE state) |
+| `SOL_MISTRAL_API_KEY` | yes | Mistral API key |
+| `SOL_CONFIG` | no | Config file path (default: `/etc/sol/sol.toml`) |
+| `SOL_SYSTEM_PROMPT` | no | System prompt file path (default: `/etc/sol/system_prompt.md`) |
+
 ## dependencies

-sol talks to three external services:
+sol talks to five external services:

 - **Matrix homeserver** — [tuwunel](https://github.com/tulir/tuwunel) (or any Matrix server)
 - **OpenSearch** — message archive + user memory indices
- **Mistral AI** — response generation, engagement evaluation, memory extraction
+- **Mistral AI** — response generation, engagement evaluation, memory extraction, agents + web search
+- **OpenBao** — secure token storage for user impersonation PATs (K8s auth, KV v2)
+- **Gitea** — git hosting API for devtools agent (repos, issues, PRs)

-## configuration
-
-sol reads config from `SOL_CONFIG` (default: `/etc/sol/sol.toml`) and the system prompt from `SOL_SYSTEM_PROMPT` (default: `/etc/sol/system_prompt.md`).
-
-secrets via environment:
-
-| Variable | Description |
-|----------|-------------|
-| `SOL_MATRIX_ACCESS_TOKEN` | Matrix access token |
-| `SOL_MATRIX_DEVICE_ID` | Matrix device ID (for E2EE) |
-| `SOL_MISTRAL_API_KEY` | Mistral API key |
-
-see `config/sol.toml` for the full config reference with defaults.
+key crates: `matrix-sdk` 0.9 (E2EE + sqlite), `mistralai-client` 1.1.0 (private registry), `opensearch` 2, `deno_core` 0.393, `rusqlite` 0.32 (bundled), `ruma` 0.12.

 ## building

@@ -70,31 +405,47 @@ see `config/sol.toml` for the full config reference with defaults.
 cargo build --release
 ```

-docker (cross-compile to x86_64 linux):
+docker (cross-compile to x86_64 linux, vendored deps):

 ```sh
 docker build -t sol .
 ```

-## running
+production build + deploy:

 ```sh
-export SOL_MATRIX_ACCESS_TOKEN="..."
-export SOL_MATRIX_DEVICE_ID="..."
-export SOL_MISTRAL_API_KEY="..."
-export SOL_CONFIG="config/sol.toml"
-export SOL_SYSTEM_PROMPT="config/system_prompt.md"
-
-cargo run --release
+sunbeam build sol --push --deploy
 ```

-## tests
+the Dockerfile uses a two-stage build: deps layer (cached until Cargo.toml/vendor change) → source layer (only sol code recompiles). final image is `gcr.io/distroless/cc-debian12:nonroot`.
+
+## testing

 ```sh
 cargo test
 ```

-80 unit tests covering config parsing, conversation windowing, engagement rules, personality templating, memory schema/store/extraction, search query building, TypeScript transpilation, and sandbox path isolation.
+unit tests covering:
+
+- config parsing (minimal, full, missing sections/fields, services, vault)
+- conversation windowing, context management, reset_all, delete_all
+- engagement rules (mention, DM, name invocation, case sensitivity, false positives)
+- personality template substitution (date, room, members, memory notes, timestamps, room context rules)
+- memory document serialization, extraction parsing, category normalization
+- archive search query building (filters, date ranges, wildcards, room_name keyword field)
+- TypeScript transpilation (basic, arrow, interface, invalid)
+- sandbox path isolation (traversal, symlink escape, nested dirs)
+- deno_core script execution (basic math, output capture)
+- SQLite CRUD (conversations, agents, service_users, load_all, bulk delete)
+- conversation message merging (DM, group, empty, single)
+- context derivation (`@user:server` → `user@server`, localpart extraction)
+- tool bridge registration and execution
+- agent UX formatting (tool calls, result truncation)
+- agent definitions (orchestrator instructions, dynamic delegation, deterministic hash)
+- token expiry validation (PAT, future, past, malformed, null)
+- Gitea API type deserialization (repos, issues, PRs, files)
+- PAT conflict status codes and scope constants
+- username mapping (OIDC → service identity)

 ## license

--- a/docs/conversations.md
+++ b/docs/conversations.md
@@ -0,0 +1,169 @@
+# Sol — Conversations API
+
+The Conversations API path provides persistent, server-side conversation state per Matrix room via Mistral's Conversations API. Enable it with `agents.use_conversations_api = true` in `sol.toml`.
+
+## lifecycle
+
+```mermaid
+sequenceDiagram
+    participant M as Matrix Sync
+    participant E as Evaluator
+    participant R as Responder
+    participant CR as ConversationRegistry
+    participant API as Mistral Conversations API
+    participant T as ToolRegistry
+    participant DB as SQLite
+
+    M->>E: message event
+    E-->>M: MustRespond/MaybeRespond
+
+    M->>R: generate_response_conversations()
+    R->>CR: send_message(room_id, input, is_dm)
+
+    alt new room (no conversation)
+        CR->>API: create_conversation(agent_id?, model, input)
+        API-->>CR: ConversationResponse + conversation_id
+        CR->>DB: upsert_conversation(room_id, conv_id, tokens)
+    else existing room
+        CR->>API: append_conversation(conv_id, input)
+        API-->>CR: ConversationResponse
+        CR->>DB: update_tokens(room_id, new_total)
+    end
+
+    alt response contains function_calls
+        loop up to max_tool_iterations (5)
+            R->>T: execute(name, args)
+            T-->>R: result string
+            R->>CR: send_function_result(room_id, entries)
+            CR->>API: append_conversation(conv_id, FunctionResult entries)
+            API-->>CR: ConversationResponse
+            alt more function_calls
+                Note over R: continue loop
+            else text response
+                Note over R: break
+            end
+        end
+    end
+
+    R-->>M: response text (or None)
+    M->>M: send to Matrix room
+    M->>M: fire-and-forget memory extraction
+```
+
+## room-to-conversation mapping
+
+Each Matrix room maps to exactly one Mistral conversation:
+
+- **Group rooms**: one shared conversation per room (all participants' messages go to the same conversation)
+- **DMs**: one conversation per DM room (already unique per user pair in Matrix)
+
+The mapping is stored in `ConversationRegistry.mapping` (HashMap in-memory, backed by SQLite `conversations` table).
+
+## ConversationState
+
+```rust
+struct ConversationState {
+    conversation_id: String,   // Mistral conversation ID
+    estimated_tokens: u32,     // running total from response.usage.total_tokens
+}
+```
+
+Token estimates are incremented on each API response and persisted to SQLite.
+
+## message formatting
+
+Messages are formatted differently based on room type:
+
+- **DMs**: raw text — `trigger_body` is sent directly
+- **Group rooms**: prefixed with Matrix user ID — `<@sienna:sunbeam.pt> what's for lunch?`
+
+This is handled by the responder before calling `ConversationRegistry.send_message()`.
+
+The `merge_user_messages()` helper (for buffered messages) follows the same pattern:
+
+```rust
+// DM: "hello\nhow are you?"
+// Group: "<@sienna:sunbeam.pt> hello\n<@lonni:sunbeam.pt> how are you?"
+```
+
+## compaction
+
+When a conversation's `estimated_tokens` reaches the `compaction_threshold` (default 118,000 — ~90% of the 131K Mistral context window):
+
+1. `ConversationRegistry.reset(room_id)` is called
+2. The mapping is removed from the in-memory HashMap
+3. The row is deleted from SQLite
+4. The next message creates a fresh conversation
+
+This means conversation history is lost on compaction. The archive still has the full history, and memory notes persist independently.
+
+## persistence
+
+### startup recovery
+
+On initialization, `ConversationRegistry::new()` calls `store.load_all_conversations()` to restore all room→conversation mappings from SQLite. This means conversations survive pod restarts.
+
+### SQLite schema
+
+```sql
+CREATE TABLE conversations (
+    room_id          TEXT PRIMARY KEY,
+    conversation_id  TEXT NOT NULL,
+    estimated_tokens INTEGER NOT NULL DEFAULT 0,
+    created_at       TEXT NOT NULL DEFAULT (datetime('now'))
+);
+```
+
+### write operations
+
+| Operation | When |
+|-----------|------|
+| `upsert_conversation` | New conversation created |
+| `update_tokens` | After each append (token count from API response) |
+| `delete_conversation` | On compaction reset |
+
+## agent integration
+
+Conversations can optionally use a Mistral agent (the orchestrator) instead of a bare model:
+
+- If `agent_id` is set (via `set_agent_id()` at startup): new conversations are created with the agent
+- If `agent_id` is `None`: conversations use the `model` directly (fallback)
+
+The agent provides Sol's personality, tool definitions, and delegation instructions. Without it, conversations still work but without agent-specific behavior.
+
+```rust
+let req = CreateConversationRequest {
+    inputs: message,
+    model: if agent_id.is_none() { Some(self.model.clone()) } else { None },
+    agent_id,
+    // ...
+};
+```
+
+## error handling
+
+- **API failure on create/append**: returns `Err(String)`, responder logs and returns `None` (no response sent to Matrix)
+- **Function result send failure**: logs error, returns `None`
+- **SQLite write failure**: logged as warning, in-memory state is still updated (will be lost on restart)
+
+Sol never crashes on a conversation error — it simply doesn't respond.
+
+## configuration
+
+| Field | Default | Description |
+|-------|---------|-------------|
+| `agents.use_conversations_api` | `false` | Enable this path |
+| `agents.orchestrator_model` | `mistral-medium-latest` | Model for orchestrator agent |
+| `agents.compaction_threshold` | `118000` | Token limit before conversation reset |
+| `mistral.max_tool_iterations` | `5` | Max function call rounds per response |
+
+## comparison with legacy path
+
+| Aspect | Legacy | Conversations API |
+|--------|--------|------------------|
+| State management | Manual `Vec<ChatMessage>` per request | Server-side, persistent |
+| Memory injection | System prompt template | Agent instructions |
+| Tool calling | Chat completions tool_choice | Function calls via conversation entries |
+| Context window | Sliding window (configurable) | Full conversation history until compaction |
+| Multimodal | ContentPart::ImageUrl | Not yet supported (TODO in responder) |
+| Persistence | None (context rebuilt from archive) | SQLite + Mistral server |
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -0,0 +1,228 @@
+# Sol — Kubernetes Deployment
+
+Sol runs as a single-replica Deployment in the `matrix` namespace. SQLite is the persistence backend, so only one pod can run at a time (Recreate strategy).
+
+## resource relationships
+
+```mermaid
+flowchart TD
+    subgraph OpenBao
+        vault[("secret/sol<br/>matrix-access-token<br/>matrix-device-id<br/>mistral-api-key")]
+    end
+
+    subgraph "matrix namespace"
+        vss[VaultStaticSecret<br/>sol-secrets]
+        secret[Secret<br/>sol-secrets]
+        cm[ConfigMap<br/>sol-config<br/>sol.toml + system_prompt.md]
+        pvc[PVC<br/>sol-data<br/>1Gi RWO]
+        deploy[Deployment<br/>sol]
+        init[initContainer<br/>fix-permissions]
+        pod[Container<br/>sol]
+    end
+
+    vault --> |VSO sync| vss
+    vss --> |creates| secret
+    vss --> |rolloutRestartTargets| deploy
+    deploy --> init
+    init --> pod
+    secret --> |env vars| pod
+    cm --> |subPath mounts| pod
+    pvc --> |/data| init
+    pvc --> |/data| pod
+```
+
+## manifests
+
+All manifests are in `infrastructure/base/matrix/`.
+
+### Deployment (`sol-deployment.yaml`)
+
+```yaml
+strategy:
+  type: Recreate         # SQLite requires single-writer
+replicas: 1
+```
+
+**initContainer** — `busybox` runs `chmod -R 777 /data && mkdir -p /data/matrix-state` to ensure the nonroot distroless container can write to the Longhorn PVC.
+
+**Container** — `sol` image (distroless/cc-debian12:nonroot)
+
+- Resources: 256Mi request / 512Mi limit memory, 100m CPU request
+- `enableServiceLinks: false` — avoids injecting service env vars that could conflict
+
+**Environment variables** (from Secret `sol-secrets`):
+
+| Env Var | Secret Key |
+|---------|-----------|
+| `SOL_MATRIX_ACCESS_TOKEN` | `matrix-access-token` |
+| `SOL_MATRIX_DEVICE_ID` | `matrix-device-id` |
+| `SOL_MISTRAL_API_KEY` | `mistral-api-key` |
+
+Fixed env vars:
+
+| Env Var | Value |
+|---------|-------|
+| `SOL_CONFIG` | `/etc/sol/sol.toml` |
+| `SOL_SYSTEM_PROMPT` | `/etc/sol/system_prompt.md` |
+
+**Volume mounts:**
+
+| Mount | Source | Details |
+|-------|--------|---------|
+| `/etc/sol/sol.toml` | ConfigMap `sol-config` | subPath: `sol.toml`, readOnly |
+| `/etc/sol/system_prompt.md` | ConfigMap `sol-config` | subPath: `system_prompt.md`, readOnly |
+| `/data` | PVC `sol-data` | read-write |
+
+### PVC (`sol-deployment.yaml`, second document)
+
+```yaml
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: sol-data
+  namespace: matrix
+spec:
+  accessModes: [ReadWriteOnce]
+  resources:
+    requests:
+      storage: 1Gi
+```
+
+Uses the default StorageClass (Longhorn).
+
+### VaultStaticSecret (`vault-secrets.yaml`)
+
+```yaml
+apiVersion: secrets.hashicorp.com/v1beta1
+kind: VaultStaticSecret
+metadata:
+  name: sol-secrets
+  namespace: matrix
+spec:
+  vaultAuthRef: vso-auth
+  mount: secret
+  type: kv-v2
+  path: sol
+  refreshAfter: 60s
+  rolloutRestartTargets:
+    - kind: Deployment
+      name: sol
+  destination:
+    name: sol-secrets
+    create: true
+    overwrite: true
+```
+
+The `rolloutRestartTargets` field means VSO will automatically restart the Sol deployment when secrets change in OpenBao.
+
+Three keys synced from OpenBao `secret/sol`:
+
+- `matrix-access-token`
+- `matrix-device-id`
+- `mistral-api-key`
+
+## `/data` mount layout
+
+```
+/data/
+├── sol.db           SQLite database (conversations + agents tables, WAL mode)
+└── matrix-state/    Matrix SDK sqlite state store (E2EE keys, sync tokens)
+```
+
+Both are created automatically. The initContainer ensures directory permissions are correct for the nonroot container.
+
+## secrets in OpenBao
+
+Store secrets at `secret/sol` in OpenBao KV v2:
+
+```sh
+# Via sunbeam seed (automated), or manually:
+openbao kv put secret/sol \
+  matrix-access-token="syt_..." \
+  matrix-device-id="DEVICE_ID" \
+  mistral-api-key="..."
+```
+
+These are synced to K8s Secret `sol-secrets` by the Vault Secrets Operator.
+
+## build and deploy
+
+```sh
+# Build only (local Docker image)
+sunbeam build sol
+
+# Build + push to registry
+sunbeam build sol --push
+
+# Build + push + deploy (apply manifests + rollout restart)
+sunbeam build sol --push --deploy
+```
+
+The Docker build cross-compiles to `x86_64-unknown-linux-gnu` on macOS. The final image is `gcr.io/distroless/cc-debian12:nonroot` (~30MB).
+
+## startup sequence
+
+1. Initialize `tracing_subscriber` with `RUST_LOG` env filter (default: `sol=info`)
+2. Load config from `SOL_CONFIG` path
+3. Load system prompt from `SOL_SYSTEM_PROMPT` path
+4. Read 3 secret env vars (`SOL_MATRIX_ACCESS_TOKEN`, `SOL_MATRIX_DEVICE_ID`, `SOL_MISTRAL_API_KEY`)
+5. Build Matrix client with E2EE sqlite store, restore session
+6. Connect to OpenSearch, ensure archive + memory indices exist
+7. Initialize Mistral client
+8. Build components: Personality, ConversationManager, ToolRegistry, Indexer, Evaluator, Responder
+9. Backfill conversation context from archive (if `backfill_on_join` enabled)
+10. Open SQLite database (fallback to in-memory on failure)
+11. Initialize AgentRegistry + ConversationRegistry (load persisted state from SQLite)
+12. If `use_conversations_api` enabled: ensure orchestrator agent exists on Mistral server
+13. Backfill reactions from Matrix room timelines
+14. Start background index flush task
+15. Start Matrix sync loop
+16. If SQLite failed: send `*sneezes*` to all joined rooms
+17. Log "Sol is running", wait for SIGINT
+
+## monitoring
+
+Sol uses `tracing` with structured fields. Default log level: `sol=info`.
+
+Key log events:
+
+| Event | Level | Fields |
+|-------|-------|--------|
+| Response sent | info | `room`, `len`, `is_dm` |
+| Tool execution | info | `tool`, `id`, `args` |
+| Engagement evaluation | info | `sender`, `rule`, `relevance`, `threshold` |
+| Memory extraction | debug | `count`, `user` |
+| Conversation created | info | `room`, `conversation_id` |
+| Agent restored/created | info | `agent_id`, `name` |
+| Backfill complete | info | `rooms`, `messages` / `reactions` |
+
+Set `RUST_LOG=sol=debug` for verbose output including tool results, evaluation prompts, and memory details.
+
+## troubleshooting
+
+**Pod won't start / CrashLoopBackOff:**
+
+```sh
+sunbeam logs matrix/sol
+```
+
+Common causes:
+- Missing secrets (env vars not set) — check `sunbeam k8s get secret sol-secrets -n matrix -o yaml`
+- ConfigMap not applied — check `sunbeam k8s get cm sol-config -n matrix`
+- PVC not bound — check `sunbeam k8s get pvc -n matrix`
+
+**SQLite recovery failure (*sneezes*):**
+
+If Sol sends `*sneezes*` on startup, it means the SQLite database at `/data/sol.db` couldn't be opened. Sol falls back to in-memory state. Check PVC mount and file permissions:
+
+```sh
+sunbeam k8s exec -n matrix deployment/sol -- ls -la /data/
+```
+
+**Matrix sync errors:**
+
+Sol auto-joins rooms on invite (3 retries with exponential backoff). If it can't join, check homeserver connectivity and access token validity.
+
+**Agent creation failure:**
+
+If the orchestrator agent can't be created, Sol falls back to model-only conversations (no agent). Check Mistral API key and quota.