update documentation for sdk, vault, gitea integration
This commit is contained in:
449
README.md
449
README.md
@@ -1,68 +1,403 @@
|
||||
# sol
|
||||
|
||||
a virtual librarian for Matrix. sol lives in your chat rooms, archives conversations in OpenSearch, and responds with the help of Mistral AI — with end-to-end encryption, tool use, and per-user memory.
|
||||
a virtual librarian for Matrix. sol lives in your chat rooms, archives conversations in OpenSearch, and responds with the help of Mistral AI — with end-to-end encryption, tool use, per-user memory, and a multi-agent architecture.
|
||||
|
||||
sol is built by [sunbeam studios](https://sunbeam.pt) as part of our self-hosted collaboration stack.
|
||||
sol is built by [sunbeam studios](https://sunbeam.pt) as part of our self-hosted collaboration stack for a three-person game studio.
|
||||
|
||||
## what sol does
|
||||
|
||||
- **Matrix presence** — joins rooms, reads the vibe, decides when to speak. direct messages always get a response; in group rooms, sol evaluates relevance before jumping in.
|
||||
- **message archive** — every message is indexed in OpenSearch with full-text and semantic search. sol can search its own archive via tools.
|
||||
- **tool use** — mistral calls tools mid-conversation: archive search, room context retrieval, and a sandboxed TypeScript/JavaScript runtime (deno_core) for computation.
|
||||
- **per-user memory** — sol remembers things about the people it talks to. memories are extracted automatically after conversations (via ministral-3b), injected into the system prompt before responding, and accessible from scripts via `sol.memory.get/set`. user isolation is enforced at the rust level.
|
||||
- **tool use** — Mistral calls tools mid-conversation: archive search, room context retrieval, room info, and a sandboxed TypeScript/JavaScript runtime (deno_core) for computation.
|
||||
- **per-user memory** — sol remembers things about the people it talks to. memories are extracted automatically after conversations, injected into the system prompt before responding, and accessible from scripts via `sol.memory.get/set`.
|
||||
- **user impersonation** — sol acts on behalf of users when calling external services. PATs are auto-provisioned via admin APIs and stored securely in OpenBao (Vault). OIDC-to-service username mappings handle identity mismatches.
|
||||
- **gitea integration** — first domain agent (sol-devtools): list repos, search issues, create issues, list PRs, get file contents — all as the requesting user.
|
||||
- **multi-agent architecture** — an orchestrator agent with personality + tools + web search. domain agent delegation is dynamic — only active agents appear in instructions. agent state persisted in SQLite with instructions hash for automatic recreation on prompt changes.
|
||||
- **conversations API** — persistent conversation state per room via Mistral's Conversations API, with automatic compaction at token thresholds. per-message context headers inject timestamps, room info, and memory notes.
|
||||
- **multimodal** — m.image messages are downloaded from Matrix via mxc://, converted to base64 data URIs, and sent as `ContentPart::ImageUrl` to Mistral vision models.
|
||||
- **reactions** — sol can react to messages with emoji when it has something to express but not enough to say.
|
||||
- **E2EE** — full end-to-end encryption via matrix-sdk with sqlite state store.
|
||||
|
||||
## architecture
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Matrix
|
||||
sync[Matrix Sync Loop]
|
||||
end
|
||||
|
||||
subgraph Engagement
|
||||
eval[Evaluator]
|
||||
rules[Rule Checks]
|
||||
llm_eval[LLM Evaluation<br/>ministral-3b]
|
||||
end
|
||||
|
||||
subgraph Response
|
||||
legacy[Legacy Path<br/>manual messages + chat completions]
|
||||
convapi[Conversations API Path<br/>ConversationRegistry + agents]
|
||||
tools[Tool Execution]
|
||||
end
|
||||
|
||||
subgraph Persistence
|
||||
sqlite[(SQLite<br/>conversations + agents)]
|
||||
opensearch[(OpenSearch<br/>archive + memory)]
|
||||
end
|
||||
|
||||
sync --> |message event| eval
|
||||
eval --> rules
|
||||
rules --> |MustRespond| Response
|
||||
rules --> |no rule match| llm_eval
|
||||
llm_eval --> |MaybeRespond| Response
|
||||
llm_eval --> |React| sync
|
||||
llm_eval --> |Ignore| sync
|
||||
legacy --> tools
|
||||
convapi --> tools
|
||||
tools --> opensearch
|
||||
legacy --> |response text| sync
|
||||
convapi --> |response text| sync
|
||||
sync --> |archive| opensearch
|
||||
convapi --> sqlite
|
||||
sync --> |memory extraction| opensearch
|
||||
```
|
||||
|
||||
## source tree
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs entrypoint, Matrix client setup, backfill
|
||||
├── sync.rs event loop — messages, reactions, redactions, invites
|
||||
├── config.rs TOML config with serde defaults
|
||||
├── context.rs ResponseContext — per-message sender identity
|
||||
├── matrix_utils.rs message extraction, reply detection, room info
|
||||
├── main.rs entrypoint, Matrix client setup, backfill, orchestrator init
|
||||
├── sync.rs event loop — messages, reactions, redactions, invites
|
||||
├── config.rs TOML config (5 sections) with serde defaults
|
||||
├── context.rs ResponseContext — per-message sender identity threading
|
||||
├── conversations.rs ConversationRegistry — room→conversation mapping, SQLite-backed
|
||||
├── persistence.rs SQLite store (WAL mode, 2 tables: conversations, agents)
|
||||
├── agent_ux.rs AgentProgress — reaction lifecycle (🔍→⚙️→✅) + thread posting
|
||||
├── matrix_utils.rs message extraction, reply/edit/thread detection, image download
|
||||
├── archive/
|
||||
│ ├── schema.rs ArchiveDocument, OpenSearch index mapping
|
||||
│ └── indexer.rs batched indexing, reactions, edits, redactions
|
||||
│ ├── schema.rs ArchiveDocument, OpenSearch index mapping
|
||||
│ └── indexer.rs batched indexing, reactions, edits, redactions
|
||||
├── brain/
|
||||
│ ├── conversation.rs sliding-window context per room
|
||||
│ ├── evaluator.rs engagement decision (must/maybe/react/ignore)
|
||||
│ ├── personality.rs system prompt templating
|
||||
│ └── responder.rs Mistral chat loop with tool iterations + memory
|
||||
│ ├── conversation.rs sliding-window context per room (configurable group/DM windows)
|
||||
│ ├── evaluator.rs engagement decision (MustRespond/MaybeRespond/React/Ignore)
|
||||
│ ├── personality.rs system prompt templating ({date}, {room_name}, {members}, etc.)
|
||||
│ └── responder.rs both response paths, tool iteration loops, memory loading
|
||||
├── memory/
|
||||
│ ├── schema.rs MemoryDocument, index mapping
|
||||
│ ├── store.rs query, get_recent, set — OpenSearch operations
|
||||
│ └── extractor.rs post-response fact extraction via ministral-3b
|
||||
│ ├── schema.rs MemoryDocument, index mapping
|
||||
│ ├── store.rs query (topical), get_recent, set — OpenSearch operations
|
||||
│ └── extractor.rs post-response fact extraction via ministral-3b
|
||||
├── agents/
|
||||
│ ├── definitions.rs orchestrator config + 8 domain agent definitions (dynamic delegation)
|
||||
│ └── registry.rs agent lifecycle with instructions hash staleness detection
|
||||
├── sdk/
|
||||
│ ├── mod.rs SDK module root
|
||||
│ ├── vault.rs OpenBao/Vault client (K8s auth, KV v2 read/write/delete)
|
||||
│ ├── tokens.rs TokenStore — Vault-backed secrets + SQLite username mappings
|
||||
│ └── gitea.rs GiteaClient — typed Gitea API v1 with PAT auto-provisioning
|
||||
└── tools/
|
||||
├── mod.rs ToolRegistry, tool definitions, dispatch
|
||||
├── search.rs archive search (keyword + semantic)
|
||||
├── room_history.rs context around a timestamp or event
|
||||
├── room_info.rs room listing, member queries
|
||||
└── script.rs deno_core sandbox with sol.* API
|
||||
├── mod.rs ToolRegistry — 12 tool definitions + dispatch (5 core + 7 gitea)
|
||||
├── search.rs archive search (keyword + semantic via embedding pipeline)
|
||||
├── room_history.rs context around a timestamp or event
|
||||
├── room_info.rs room listing, member queries
|
||||
├── script.rs deno_core sandbox with sol.* host API, TS transpilation
|
||||
├── devtools.rs Gitea tool handlers (repos, issues, PRs, files)
|
||||
└── bridge.rs ToolBridge — generic async handler map for future SDK integration
|
||||
```
|
||||
|
||||
|
||||
## engagement pipeline
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant M as Matrix
|
||||
participant S as Sync Handler
|
||||
participant E as Evaluator
|
||||
participant LLM as ministral-3b
|
||||
participant R as Responder
|
||||
|
||||
M->>S: m.room.message
|
||||
S->>S: archive message
|
||||
S->>S: update conversation context
|
||||
S->>E: evaluate(sender, body, is_dm, recent)
|
||||
|
||||
alt own message
|
||||
E-->>S: Ignore
|
||||
else @mention or matrix.to link
|
||||
E-->>S: MustRespond (DirectMention)
|
||||
else DM
|
||||
E-->>S: MustRespond (DirectMessage)
|
||||
else "sol" or "hey sol"
|
||||
E-->>S: MustRespond (NameInvocation)
|
||||
else no rule match
|
||||
E->>LLM: relevance evaluation (JSON)
|
||||
LLM-->>E: {relevance, hook, emoji}
|
||||
alt relevance >= spontaneous_threshold (0.85)
|
||||
E-->>S: MaybeRespond
|
||||
else relevance >= reaction_threshold (0.6) + emoji
|
||||
E-->>S: React (emoji)
|
||||
else below thresholds
|
||||
E-->>S: Ignore
|
||||
end
|
||||
end
|
||||
|
||||
alt MustRespond or MaybeRespond
|
||||
S->>S: check in-flight guard
|
||||
S->>S: check cooldown (15s default)
|
||||
S->>R: generate response
|
||||
end
|
||||
```
|
||||
|
||||
## response generation
|
||||
|
||||
Sol has two response paths, controlled by `agents.use_conversations_api`:
|
||||
|
||||
### legacy path (`generate_response`)
|
||||
|
||||
1. Apply response delay (random within configured range)
|
||||
2. Send typing indicator
|
||||
3. Load memory notes (topical query + recent backfill, max 5)
|
||||
4. Build system prompt via `Personality` (template substitution: `{date}`, `{room_name}`, `{members}`, `{memory_notes}`, `{room_context_rules}`, `{epoch_ms}`)
|
||||
5. Assemble message array: system → context messages (with timestamps) → trigger (multimodal if image)
|
||||
6. Tool iteration loop (up to `max_tool_iterations`, default 5):
|
||||
- If `finish_reason == ToolCalls`: execute tools, append results, continue
|
||||
- If text response: strip "sol:" prefix, return
|
||||
7. Fire-and-forget memory extraction
|
||||
|
||||
### conversations API path (`generate_response_conversations`)
|
||||
|
||||
1. Apply response delay
|
||||
2. Send typing indicator
|
||||
3. Format input: raw text for DMs, `<@user:server> text` for groups
|
||||
4. Send through `ConversationRegistry.send_message()` (creates or appends to Mistral conversation)
|
||||
5. Function call loop (up to `max_tool_iterations`):
|
||||
- Execute tool calls locally via `ToolRegistry`
|
||||
- Send `FunctionResultEntry` back to conversation
|
||||
6. Extract assistant text, strip prefix, return
|
||||
|
||||
## tool system
|
||||
|
||||
| Tool | Parameters | Description |
|
||||
|------|-----------|-------------|
|
||||
| `search_archive` | `query` (required), `room`, `sender`, `after`, `before`, `limit`, `semantic` | Search the message archive (keyword or semantic) |
|
||||
| `get_room_context` | `room_id` (required), `around_timestamp`, `around_event_id`, `before_count`, `after_count` | Get messages around a point in time or event |
|
||||
| `list_rooms` | *(none)* | List all rooms Sol is in with names and member counts |
|
||||
| `get_room_members` | `room_id` (required) | Get members of a specific room |
|
||||
| `run_script` | `code` (required) | Execute TypeScript/JavaScript in a sandboxed deno_core runtime |
|
||||
|
||||
### run_script sandbox
|
||||
|
||||
The script runtime is a fresh V8 isolate per invocation with:
|
||||
|
||||
- **TypeScript support** — code is transpiled via `deno_ast` before execution
|
||||
- **Timeout** — configurable via `behavior.script_timeout_secs` (default 5s), enforced by V8 isolate termination
|
||||
- **Heap limit** — configurable via `behavior.script_max_heap_mb` (default 64MB)
|
||||
- **Output** — `console.log()` + last expression value, truncated to 4096 characters
|
||||
- **Temp filesystem** — sandboxed `sol.fs.read/write/list` with path traversal protection
|
||||
- **Network** — `sol.fetch(url)` restricted to `behavior.script_fetch_allowlist` domains
|
||||
|
||||
Host API (`sol.*`):
|
||||
|
||||
```typescript
|
||||
sol.search(query, opts?) // search message archive
|
||||
sol.rooms() // list joined rooms → [{name, id, members}]
|
||||
sol.members(roomName) // get room members → [{name, id}]
|
||||
sol.fetch(url) // HTTP GET (allowlisted domains only)
|
||||
sol.memory.get(query?) // retrieve memories relevant to query
|
||||
sol.memory.set(content, category?) // save a memory note
|
||||
sol.fs.read(path) // read file from sandbox
|
||||
sol.fs.write(path, content) // write file to sandbox
|
||||
sol.fs.list(path?) // list sandbox directory
|
||||
```
|
||||
|
||||
All `sol.*` methods are async — use `await`.
|
||||
|
||||
## memory system
|
||||
|
||||
### extraction (post-response, fire-and-forget)
|
||||
|
||||
After each response, a background task sends the exchange to `ministral-3b` with a structured extraction prompt. The model returns `{"memories": [{"content": "...", "category": "preference|fact|context"}]}`. Categories are normalized via `normalize_category()` — valid categories are `preference`, `fact`, `context`; anything else falls back to `general`.
|
||||
|
||||
### storage (OpenSearch)
|
||||
|
||||
Each memory is a `MemoryDocument` with: `id`, `user_id`, `content`, `category`, `created_at`, `updated_at`, `source` (`"auto"` or `"script"`). The index name defaults to `sol_user_memory`. User isolation is enforced at the Rust level via `user_id` filtering on all queries.
|
||||
|
||||
### pre-response loading
|
||||
|
||||
Before generating a response, the responder loads up to 5 memories:
|
||||
|
||||
1. **Topical query** — semantic search against the trigger message
|
||||
2. **Recent backfill** — if fewer than 3 topical results, fill remaining slots with most recent memories
|
||||
|
||||
Memory notes are injected into the system prompt as a `## notes about {display_name}` block with instructions to use them naturally without mentioning their existence.
|
||||
|
||||
## archive
|
||||
|
||||
Every message event is archived as an `ArchiveDocument` in OpenSearch:
|
||||
|
||||
- **Batch indexing** — messages are buffered and flushed periodically (`opensearch.batch_size` default 50, `opensearch.flush_interval_ms` default 2000)
|
||||
- **Embedding pipeline** — configurable via `opensearch.embedding_pipeline` for semantic search
|
||||
- **Edit tracking** — `m.replace` events update the original document's content
|
||||
- **Redaction** — `m.room.redaction` sets `redacted: true` on the original
|
||||
- **Reactions** — `m.reaction` events append `{sender, emoji, timestamp}` to the document's reactions array
|
||||
- **Backfill** — on startup, conversation context is backfilled from the archive; reactions are backfilled from Matrix room timelines (last 500 events per room)
|
||||
|
||||
## agent architecture
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> CheckMemory: startup
|
||||
CheckMemory --> CheckServer: agent_id in SQLite?
|
||||
CheckMemory --> SearchByName: not in SQLite
|
||||
|
||||
CheckServer --> Ready: exists on Mistral server
|
||||
CheckServer --> SearchByName: gone from server
|
||||
|
||||
SearchByName --> Ready: found by name
|
||||
SearchByName --> Create: not found
|
||||
|
||||
Create --> Ready: agent created
|
||||
Ready --> [*]
|
||||
```
|
||||
|
||||
### orchestrator
|
||||
|
||||
The orchestrator agent carries Sol's full personality (system prompt) plus all 5 tool definitions converted to `AgentTool` format. It's created on startup if `agents.use_conversations_api` is enabled. Temperature: 0.5.
|
||||
|
||||
### domain agents (8 definitions)
|
||||
|
||||
| Agent | Domain |
|
||||
|-------|--------|
|
||||
| `sol-observability` | Metrics, logs, dashboards, alerts (Prometheus, Loki, Grafana) |
|
||||
| `sol-data` | Full-text search, object storage (OpenSearch, SeaweedFS) |
|
||||
| `sol-devtools` | Git repos, issues, PRs, kanban boards (Gitea, Planka) |
|
||||
| `sol-infrastructure` | Kubernetes, deployments, certificates, builds |
|
||||
| `sol-identity` | User accounts, sessions, OAuth2 (Kratos, Hydra) |
|
||||
| `sol-collaboration` | Contacts, documents, meetings, files, email, calendars (La Suite) |
|
||||
| `sol-communication` | Chat rooms, messages, members (Matrix) |
|
||||
| `sol-media` | Video/audio rooms, recordings, streams (LiveKit) |
|
||||
|
||||
Domain agents are defined in `agents/definitions.rs` as `DOMAIN_AGENTS` (name, description, instructions). Temperature: 0.3.
|
||||
|
||||
### ToolBridge
|
||||
|
||||
`tools/bridge.rs` provides a generic async handler map (`ToolBridge`) for mapping Mistral tool call names to handler functions. This is scaffolding for future SDK-based tool integration where domain agents will have their own tool sets.
|
||||
|
||||
## persistence
|
||||
|
||||
SQLite database at `/data/sol.db` (configurable via `matrix.db_path`), WAL mode.
|
||||
|
||||
### tables
|
||||
|
||||
**conversations** — room_id (PK), conversation_id, estimated_tokens, created_at
|
||||
|
||||
**agents** — name (PK), agent_id, model, created_at
|
||||
|
||||
### recovery behavior
|
||||
|
||||
On startup, if the database fails to open:
|
||||
|
||||
1. Log error
|
||||
2. Fall back to in-memory SQLite (conversations won't survive restarts)
|
||||
3. After sync loop starts, send `*sneezes*` to all joined rooms to signal the hiccup
|
||||
|
||||
## multimodal
|
||||
|
||||
When an `m.image` message arrives:
|
||||
|
||||
1. Extract media source from event (`MessageType::Image`)
|
||||
2. Download bytes from Matrix media API via `matrix_sdk::media::get_media_content`
|
||||
3. Base64-encode as `data:{mime};base64,{data}` URI
|
||||
4. Pass to Mistral as `ContentPart::ImageUrl` alongside any text caption
|
||||
|
||||
Encrypted images are not supported (the `MediaSource::Encrypted` variant is skipped).
|
||||
|
||||
## configuration reference
|
||||
|
||||
Config is loaded from `SOL_CONFIG` (default: `/etc/sol/sol.toml`).
|
||||
|
||||
### `[matrix]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `homeserver_url` | string | *required* | Matrix homeserver URL |
|
||||
| `user_id` | string | *required* | Bot's Matrix user ID |
|
||||
| `state_store_path` | string | *required* | Path for Matrix SDK sqlite state |
|
||||
| `db_path` | string | `/data/sol.db` | SQLite database for persistent state |
|
||||
|
||||
### `[opensearch]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `url` | string | *required* | OpenSearch cluster URL |
|
||||
| `index` | string | *required* | Archive index name |
|
||||
| `batch_size` | usize | `50` | Messages per flush batch |
|
||||
| `flush_interval_ms` | u64 | `2000` | Flush interval in milliseconds |
|
||||
| `embedding_pipeline` | string | `tuwunel_embedding_pipeline` | Ingest pipeline for semantic embeddings |
|
||||
| `memory_index` | string | `sol_user_memory` | Memory index name |
|
||||
|
||||
### `[mistral]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `default_model` | string | `mistral-medium-latest` | Model for response generation |
|
||||
| `evaluation_model` | string | `ministral-3b-latest` | Model for engagement evaluation + memory extraction |
|
||||
| `research_model` | string | `mistral-large-latest` | Model for research tasks |
|
||||
| `max_tool_iterations` | usize | `5` | Max tool call rounds per response |
|
||||
|
||||
### `[behavior]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `response_delay_min_ms` | u64 | `100` | Min delay before direct response |
|
||||
| `response_delay_max_ms` | u64 | `2300` | Max delay before direct response |
|
||||
| `spontaneous_delay_min_ms` | u64 | `15000` | Min delay before spontaneous response |
|
||||
| `spontaneous_delay_max_ms` | u64 | `60000` | Max delay before spontaneous response |
|
||||
| `spontaneous_threshold` | f32 | `0.85` | LLM relevance score to trigger spontaneous response |
|
||||
| `reaction_threshold` | f32 | `0.6` | LLM relevance score to trigger emoji reaction |
|
||||
| `reaction_enabled` | bool | `true` | Enable emoji reactions |
|
||||
| `room_context_window` | usize | `30` | Messages to keep in group room context |
|
||||
| `dm_context_window` | usize | `100` | Messages to keep in DM context |
|
||||
| `backfill_on_join` | bool | `true` | Backfill context from archive on startup |
|
||||
| `backfill_limit` | usize | `10000` | Max messages to backfill |
|
||||
| `instant_responses` | bool | `false` | Skip response delays (for testing) |
|
||||
| `cooldown_after_response_ms` | u64 | `15000` | Cooldown before another spontaneous response |
|
||||
| `evaluation_context_window` | usize | `25` | Recent messages sent to evaluation LLM |
|
||||
| `detect_sol_in_conversation` | bool | `true` | Use active/passive evaluation prompts based on Sol's participation |
|
||||
| `evaluation_prompt_active` | string? | *(built-in)* | Custom prompt when Sol is in conversation |
|
||||
| `evaluation_prompt_passive` | string? | *(built-in)* | Custom prompt when Sol hasn't spoken |
|
||||
| `script_timeout_secs` | u64 | `5` | Script execution timeout |
|
||||
| `script_max_heap_mb` | usize | `64` | V8 heap limit for scripts |
|
||||
| `script_fetch_allowlist` | string[] | `[]` | Domains allowed for `sol.fetch()` |
|
||||
| `memory_extraction_enabled` | bool | `true` | Enable post-response memory extraction |
|
||||
|
||||
### `[agents]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `orchestrator_model` | string | `mistral-medium-latest` | Model for orchestrator agent |
|
||||
| `domain_model` | string | `mistral-medium-latest` | Model for domain agents |
|
||||
| `compaction_threshold` | u32 | `118000` | Token estimate before conversation reset (~90% of 131K context) |
|
||||
| `use_conversations_api` | bool | `false` | Enable Conversations API path (vs legacy chat completions) |
|
||||
|
||||
## environment variables
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `SOL_MATRIX_ACCESS_TOKEN` | yes | Matrix access token |
|
||||
| `SOL_MATRIX_DEVICE_ID` | yes | Matrix device ID (for E2EE state) |
|
||||
| `SOL_MISTRAL_API_KEY` | yes | Mistral API key |
|
||||
| `SOL_CONFIG` | no | Config file path (default: `/etc/sol/sol.toml`) |
|
||||
| `SOL_SYSTEM_PROMPT` | no | System prompt file path (default: `/etc/sol/system_prompt.md`) |
|
||||
|
||||
## dependencies
|
||||
|
||||
sol talks to three external services:
|
||||
sol talks to five external services:
|
||||
|
||||
- **Matrix homeserver** — [tuwunel](https://github.com/tulir/tuwunel) (or any Matrix server)
|
||||
- **OpenSearch** — message archive + user memory indices
|
||||
- **Mistral AI** — response generation, engagement evaluation, memory extraction
|
||||
- **Mistral AI** — response generation, engagement evaluation, memory extraction, agents + web search
|
||||
- **OpenBao** — secure token storage for user impersonation PATs (K8s auth, KV v2)
|
||||
- **Gitea** — git hosting API for devtools agent (repos, issues, PRs)
|
||||
|
||||
## configuration
|
||||
|
||||
sol reads config from `SOL_CONFIG` (default: `/etc/sol/sol.toml`) and the system prompt from `SOL_SYSTEM_PROMPT` (default: `/etc/sol/system_prompt.md`).
|
||||
|
||||
secrets via environment:
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `SOL_MATRIX_ACCESS_TOKEN` | Matrix access token |
|
||||
| `SOL_MATRIX_DEVICE_ID` | Matrix device ID (for E2EE) |
|
||||
| `SOL_MISTRAL_API_KEY` | Mistral API key |
|
||||
|
||||
see `config/sol.toml` for the full config reference with defaults.
|
||||
key crates: `matrix-sdk` 0.9 (E2EE + sqlite), `mistralai-client` 1.1.0 (private registry), `opensearch` 2, `deno_core` 0.393, `rusqlite` 0.32 (bundled), `ruma` 0.12.
|
||||
|
||||
## building
|
||||
|
||||
@@ -70,31 +405,47 @@ see `config/sol.toml` for the full config reference with defaults.
|
||||
cargo build --release
|
||||
```
|
||||
|
||||
docker (cross-compile to x86_64 linux):
|
||||
docker (cross-compile to x86_64 linux, vendored deps):
|
||||
|
||||
```sh
|
||||
docker build -t sol .
|
||||
```
|
||||
|
||||
## running
|
||||
production build + deploy:
|
||||
|
||||
```sh
|
||||
export SOL_MATRIX_ACCESS_TOKEN="..."
|
||||
export SOL_MATRIX_DEVICE_ID="..."
|
||||
export SOL_MISTRAL_API_KEY="..."
|
||||
export SOL_CONFIG="config/sol.toml"
|
||||
export SOL_SYSTEM_PROMPT="config/system_prompt.md"
|
||||
|
||||
cargo run --release
|
||||
sunbeam build sol --push --deploy
|
||||
```
|
||||
|
||||
## tests
|
||||
the Dockerfile uses a two-stage build: deps layer (cached until Cargo.toml/vendor change) → source layer (only sol code recompiles). final image is `gcr.io/distroless/cc-debian12:nonroot`.
|
||||
|
||||
## testing
|
||||
|
||||
```sh
|
||||
cargo test
|
||||
```
|
||||
|
||||
80 unit tests covering config parsing, conversation windowing, engagement rules, personality templating, memory schema/store/extraction, search query building, TypeScript transpilation, and sandbox path isolation.
|
||||
unit tests covering:
|
||||
|
||||
- config parsing (minimal, full, missing sections/fields, services, vault)
|
||||
- conversation windowing, context management, reset_all, delete_all
|
||||
- engagement rules (mention, DM, name invocation, case sensitivity, false positives)
|
||||
- personality template substitution (date, room, members, memory notes, timestamps, room context rules)
|
||||
- memory document serialization, extraction parsing, category normalization
|
||||
- archive search query building (filters, date ranges, wildcards, room_name keyword field)
|
||||
- TypeScript transpilation (basic, arrow, interface, invalid)
|
||||
- sandbox path isolation (traversal, symlink escape, nested dirs)
|
||||
- deno_core script execution (basic math, output capture)
|
||||
- SQLite CRUD (conversations, agents, service_users, load_all, bulk delete)
|
||||
- conversation message merging (DM, group, empty, single)
|
||||
- context derivation (`@user:server` → `user@server`, localpart extraction)
|
||||
- tool bridge registration and execution
|
||||
- agent UX formatting (tool calls, result truncation)
|
||||
- agent definitions (orchestrator instructions, dynamic delegation, deterministic hash)
|
||||
- token expiry validation (PAT, future, past, malformed, null)
|
||||
- Gitea API type deserialization (repos, issues, PRs, files)
|
||||
- PAT conflict status codes and scope constants
|
||||
- username mapping (OIDC → service identity)
|
||||
|
||||
## license
|
||||
|
||||
|
||||
Reference in New Issue
Block a user