sol 1.0.0 — relicense to AGPL-3.0, dual-license model
relicensed from MIT to AGPL-3.0-or-later. commercial license available for organizations that need private modifications (does not permit redistribution). sol 1.0.0 ships with: - multi-agent architecture with mistral conversations API - user impersonation via vault-backed PAT provisioning - gitea integration (first domain agent: devtools) - per-user memory system with automatic extraction - full-context evaluator with system prompt awareness - agent recreation on prompt changes with conversation reset - web search, sandboxed deno runtime, archive search
This commit is contained in:
326
README.md
326
README.md
@@ -1,24 +1,24 @@
|
||||
# sol
|
||||
# Sol
|
||||
|
||||
a virtual librarian for Matrix. sol lives in your chat rooms, archives conversations in OpenSearch, and responds with the help of Mistral AI — with end-to-end encryption, tool use, per-user memory, and a multi-agent architecture.
|
||||
A virtual librarian for Matrix. Sol lives in your chat rooms, archives conversations in OpenSearch, and responds with the help of Mistral AI — with end-to-end encryption, tool use, per-user memory, and a multi-agent architecture.
|
||||
|
||||
sol is built by [sunbeam studios](https://sunbeam.pt) as part of our self-hosted collaboration stack for a three-person game studio.
|
||||
Sol is built by [Sunbeam Studios](https://sunbeam.pt) as part of our self-hosted collaboration stack.
|
||||
|
||||
## what sol does
|
||||
## What Sol Does
|
||||
|
||||
- **Matrix presence** — joins rooms, reads the vibe, decides when to speak. direct messages always get a response; in group rooms, sol evaluates relevance before jumping in.
|
||||
- **message archive** — every message is indexed in OpenSearch with full-text and semantic search. sol can search its own archive via tools.
|
||||
- **tool use** — Mistral calls tools mid-conversation: archive search, room context retrieval, room info, and a sandboxed TypeScript/JavaScript runtime (deno_core) for computation.
|
||||
- **per-user memory** — sol remembers things about the people it talks to. memories are extracted automatically after conversations, injected into the system prompt before responding, and accessible from scripts via `sol.memory.get/set`.
|
||||
- **user impersonation** — sol acts on behalf of users when calling external services. PATs are auto-provisioned via admin APIs and stored securely in OpenBao (Vault). OIDC-to-service username mappings handle identity mismatches.
|
||||
- **gitea integration** — first domain agent (sol-devtools): list repos, search issues, create issues, list PRs, get file contents — all as the requesting user.
|
||||
- **multi-agent architecture** — an orchestrator agent with personality + tools + web search. domain agent delegation is dynamic — only active agents appear in instructions. agent state persisted in SQLite with instructions hash for automatic recreation on prompt changes.
|
||||
- **conversations API** — persistent conversation state per room via Mistral's Conversations API, with automatic compaction at token thresholds. per-message context headers inject timestamps, room info, and memory notes.
|
||||
- **multimodal** — m.image messages are downloaded from Matrix via mxc://, converted to base64 data URIs, and sent as `ContentPart::ImageUrl` to Mistral vision models.
|
||||
- **reactions** — sol can react to messages with emoji when it has something to express but not enough to say.
|
||||
- **E2EE** — full end-to-end encryption via matrix-sdk with sqlite state store.
|
||||
- **Matrix Presence** — Joins rooms, reads the vibe, decides when to speak. Direct messages always get a response; in group rooms, Sol evaluates relevance before jumping in.
|
||||
- **Message Archive** — Every message is indexed in OpenSearch with full-text and semantic search. Sol can search its own archive via tools.
|
||||
- **Tool Use** — Mistral calls tools mid-conversation: archive search, room context retrieval, room info, and a sandboxed TypeScript/JavaScript runtime (deno_core) for computation.
|
||||
- **Per-User Memory** — Sol remembers things about the people it talks to. Memories are extracted automatically after conversations, injected into the system prompt before responding, and accessible from scripts via `sol.memory.get/set`.
|
||||
- **User Impersonation** — Sol acts on behalf of users when calling external services. PATs are auto-provisioned via admin APIs and stored securely in OpenBao (Vault). OIDC-to-service username mappings handle identity mismatches.
|
||||
- **Gitea Integration** — First domain agent (sol-devtools): list repos, search issues, create issues, list PRs, get file contents — all as the requesting user.
|
||||
- **Multi-Agent Architecture** — An orchestrator agent with personality + tools + web search. Domain agent delegation is dynamic — only active agents appear in instructions. Agent state is persisted in SQLite with instructions hash for automatic recreation on prompt changes.
|
||||
- **Conversations API** — Persistent conversation state per room via Mistral's Conversations API, with automatic compaction at token thresholds. Per-message context headers inject timestamps, room info, and memory notes.
|
||||
- **Multimodal** — `m.image` messages are downloaded from Matrix via `mxc://`, converted to base64 data URIs, and sent as `ContentPart::ImageUrl` to Mistral vision models.
|
||||
- **Reactions** — Sol can react to messages with emoji when it has something to express but not enough to say.
|
||||
- **E2EE** — Full end-to-end encryption via matrix-sdk with SQLite state store.
|
||||
|
||||
## architecture
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
@@ -33,7 +33,7 @@ flowchart TD
|
||||
end
|
||||
|
||||
subgraph Response
|
||||
legacy[Legacy Path<br/>manual messages + chat completions]
|
||||
legacy[Legacy Path<br/>Manual messages + chat completions]
|
||||
convapi[Conversations API Path<br/>ConversationRegistry + agents]
|
||||
tools[Tool Execution]
|
||||
end
|
||||
@@ -60,50 +60,49 @@ flowchart TD
|
||||
sync --> |memory extraction| opensearch
|
||||
```
|
||||
|
||||
## source tree
|
||||
## Source Tree
|
||||
|
||||
```
|
||||
src/
|
||||
├── main.rs entrypoint, Matrix client setup, backfill, orchestrator init
|
||||
├── sync.rs event loop — messages, reactions, redactions, invites
|
||||
├── config.rs TOML config (5 sections) with serde defaults
|
||||
├── main.rs Entrypoint, Matrix client setup, backfill, orchestrator init
|
||||
├── sync.rs Event loop — messages, reactions, redactions, invites
|
||||
├── config.rs TOML config with serde defaults
|
||||
├── context.rs ResponseContext — per-message sender identity threading
|
||||
├── conversations.rs ConversationRegistry — room→conversation mapping, SQLite-backed
|
||||
├── persistence.rs SQLite store (WAL mode, 2 tables: conversations, agents)
|
||||
├── persistence.rs SQLite store (WAL mode, tables: conversations, agents, service_users)
|
||||
├── agent_ux.rs AgentProgress — reaction lifecycle (🔍→⚙️→✅) + thread posting
|
||||
├── matrix_utils.rs message extraction, reply/edit/thread detection, image download
|
||||
├── matrix_utils.rs Message extraction, reply/edit/thread detection, image download
|
||||
├── archive/
|
||||
│ ├── schema.rs ArchiveDocument, OpenSearch index mapping
|
||||
│ └── indexer.rs batched indexing, reactions, edits, redactions
|
||||
│ └── indexer.rs Batched indexing, reactions, edits, redactions
|
||||
├── brain/
|
||||
│ ├── conversation.rs sliding-window context per room (configurable group/DM windows)
|
||||
│ ├── evaluator.rs engagement decision (MustRespond/MaybeRespond/React/Ignore)
|
||||
│ ├── personality.rs system prompt templating ({date}, {room_name}, {members}, etc.)
|
||||
│ └── responder.rs both response paths, tool iteration loops, memory loading
|
||||
│ ├── conversation.rs Sliding-window context per room (configurable group/DM windows)
|
||||
│ ├── evaluator.rs Engagement decision (MustRespond/MaybeRespond/React/Ignore)
|
||||
│ ├── personality.rs System prompt templating ({date}, {room_name}, {members}, etc.)
|
||||
│ └── responder.rs Both response paths, tool iteration loops, memory loading
|
||||
├── memory/
|
||||
│ ├── schema.rs MemoryDocument, index mapping
|
||||
│ ├── store.rs query (topical), get_recent, set — OpenSearch operations
|
||||
│ └── extractor.rs post-response fact extraction via ministral-3b
|
||||
│ ├── store.rs Query (topical), get_recent, set — OpenSearch operations
|
||||
│ └── extractor.rs Post-response fact extraction via ministral-3b
|
||||
├── agents/
|
||||
│ ├── definitions.rs orchestrator config + 8 domain agent definitions (dynamic delegation)
|
||||
│ └── registry.rs agent lifecycle with instructions hash staleness detection
|
||||
│ ├── definitions.rs Orchestrator config + domain agent definitions (dynamic delegation)
|
||||
│ └── registry.rs Agent lifecycle with instructions hash staleness detection
|
||||
├── sdk/
|
||||
│ ├── mod.rs SDK module root
|
||||
│ ├── vault.rs OpenBao/Vault client (K8s auth, KV v2 read/write/delete)
|
||||
│ ├── tokens.rs TokenStore — Vault-backed secrets + SQLite username mappings
|
||||
│ └── gitea.rs GiteaClient — typed Gitea API v1 with PAT auto-provisioning
|
||||
└── tools/
|
||||
├── mod.rs ToolRegistry — 12 tool definitions + dispatch (5 core + 7 gitea)
|
||||
├── search.rs archive search (keyword + semantic via embedding pipeline)
|
||||
├── room_history.rs context around a timestamp or event
|
||||
├── room_info.rs room listing, member queries
|
||||
├── mod.rs ToolRegistry — tool definitions + dispatch (core + gitea)
|
||||
├── search.rs Archive search (keyword + semantic via embedding pipeline)
|
||||
├── room_history.rs Context around a timestamp or event
|
||||
├── room_info.rs Room listing, member queries
|
||||
├── script.rs deno_core sandbox with sol.* host API, TS transpilation
|
||||
├── devtools.rs Gitea tool handlers (repos, issues, PRs, files)
|
||||
└── bridge.rs ToolBridge — generic async handler map for future SDK integration
|
||||
```
|
||||
|
||||
|
||||
## engagement pipeline
|
||||
## Engagement Pipeline
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
@@ -114,11 +113,11 @@ sequenceDiagram
|
||||
participant R as Responder
|
||||
|
||||
M->>S: m.room.message
|
||||
S->>S: archive message
|
||||
S->>S: update conversation context
|
||||
S->>S: Archive message
|
||||
S->>S: Update conversation context
|
||||
S->>E: evaluate(sender, body, is_dm, recent)
|
||||
|
||||
alt own message
|
||||
alt Own message
|
||||
E-->>S: Ignore
|
||||
else @mention or matrix.to link
|
||||
E-->>S: MustRespond (DirectMention)
|
||||
@@ -126,30 +125,30 @@ sequenceDiagram
|
||||
E-->>S: MustRespond (DirectMessage)
|
||||
else "sol" or "hey sol"
|
||||
E-->>S: MustRespond (NameInvocation)
|
||||
else no rule match
|
||||
E->>LLM: relevance evaluation (JSON)
|
||||
else No rule match
|
||||
E->>LLM: Relevance evaluation (JSON)
|
||||
LLM-->>E: {relevance, hook, emoji}
|
||||
alt relevance >= spontaneous_threshold (0.85)
|
||||
E-->>S: MaybeRespond
|
||||
else relevance >= reaction_threshold (0.6) + emoji
|
||||
E-->>S: React (emoji)
|
||||
else below thresholds
|
||||
else Below thresholds
|
||||
E-->>S: Ignore
|
||||
end
|
||||
end
|
||||
|
||||
alt MustRespond or MaybeRespond
|
||||
S->>S: check in-flight guard
|
||||
S->>S: check cooldown (15s default)
|
||||
S->>R: generate response
|
||||
S->>S: Check in-flight guard
|
||||
S->>S: Check cooldown (15s default)
|
||||
S->>R: Generate response
|
||||
end
|
||||
```
|
||||
|
||||
## response generation
|
||||
## Response Generation
|
||||
|
||||
Sol has two response paths, controlled by `agents.use_conversations_api`:
|
||||
|
||||
### legacy path (`generate_response`)
|
||||
### Legacy Path (`generate_response`)
|
||||
|
||||
1. Apply response delay (random within configured range)
|
||||
2. Send typing indicator
|
||||
@@ -161,18 +160,20 @@ Sol has two response paths, controlled by `agents.use_conversations_api`:
|
||||
- If text response: strip "sol:" prefix, return
|
||||
7. Fire-and-forget memory extraction
|
||||
|
||||
### conversations API path (`generate_response_conversations`)
|
||||
### Conversations API Path (`generate_response_conversations`)
|
||||
|
||||
1. Apply response delay
|
||||
2. Send typing indicator
|
||||
3. Format input: raw text for DMs, `<@user:server> text` for groups
|
||||
4. Send through `ConversationRegistry.send_message()` (creates or appends to Mistral conversation)
|
||||
5. Function call loop (up to `max_tool_iterations`):
|
||||
3. Load memory notes for the user
|
||||
4. Build per-message context header (timestamps, room name, memory notes)
|
||||
5. Format input: raw text for DMs, `<@user:server> text` for groups
|
||||
6. Send through `ConversationRegistry.send_message()` (creates or appends to Mistral conversation)
|
||||
7. Function call loop (up to `max_tool_iterations`):
|
||||
- Execute tool calls locally via `ToolRegistry`
|
||||
- Send `FunctionResultEntry` back to conversation
|
||||
6. Extract assistant text, strip prefix, return
|
||||
8. Extract assistant text, strip prefix, return
|
||||
|
||||
## tool system
|
||||
## Tool System
|
||||
|
||||
| Tool | Parameters | Description |
|
||||
|------|-----------|-------------|
|
||||
@@ -181,116 +182,118 @@ Sol has two response paths, controlled by `agents.use_conversations_api`:
|
||||
| `list_rooms` | *(none)* | List all rooms Sol is in with names and member counts |
|
||||
| `get_room_members` | `room_id` (required) | Get members of a specific room |
|
||||
| `run_script` | `code` (required) | Execute TypeScript/JavaScript in a sandboxed deno_core runtime |
|
||||
| `gitea_list_repos` | `query`, `org`, `limit` | List or search repositories on Gitea |
|
||||
| `gitea_get_repo` | `owner`, `repo` (required) | Get details about a specific repository |
|
||||
| `gitea_list_issues` | `owner`, `repo` (required), `state`, `labels`, `limit` | List issues in a repository |
|
||||
| `gitea_get_issue` | `owner`, `repo`, `number` (required) | Get full details of a specific issue |
|
||||
| `gitea_create_issue` | `owner`, `repo`, `title` (required), `body`, `labels` | Create a new issue as the requesting user |
|
||||
| `gitea_list_pulls` | `owner`, `repo` (required), `state`, `limit` | List pull requests in a repository |
|
||||
| `gitea_get_file` | `owner`, `repo`, `path` (required), `ref` | Get file contents from a repository |
|
||||
|
||||
### run_script sandbox
|
||||
### `run_script` Sandbox
|
||||
|
||||
The script runtime is a fresh V8 isolate per invocation with:
|
||||
|
||||
- **TypeScript support** — code is transpiled via `deno_ast` before execution
|
||||
- **Timeout** — configurable via `behavior.script_timeout_secs` (default 5s), enforced by V8 isolate termination
|
||||
- **Heap limit** — configurable via `behavior.script_max_heap_mb` (default 64MB)
|
||||
- **TypeScript support** — Code is transpiled via `deno_ast` before execution
|
||||
- **Timeout** — Configurable via `behavior.script_timeout_secs` (default 5s), enforced by V8 isolate termination
|
||||
- **Heap limit** — Configurable via `behavior.script_max_heap_mb` (default 64MB)
|
||||
- **Output** — `console.log()` + last expression value, truncated to 4096 characters
|
||||
- **Temp filesystem** — sandboxed `sol.fs.read/write/list` with path traversal protection
|
||||
- **Temp filesystem** — Sandboxed `sol.fs.read/write/list` with path traversal protection
|
||||
- **Network** — `sol.fetch(url)` restricted to `behavior.script_fetch_allowlist` domains
|
||||
|
||||
Host API (`sol.*`):
|
||||
|
||||
```typescript
|
||||
sol.search(query, opts?) // search message archive
|
||||
sol.rooms() // list joined rooms → [{name, id, members}]
|
||||
sol.members(roomName) // get room members → [{name, id}]
|
||||
sol.search(query, opts?) // Search message archive
|
||||
sol.rooms() // List joined rooms → [{name, id, members}]
|
||||
sol.members(roomName) // Get room members → [{name, id}]
|
||||
sol.fetch(url) // HTTP GET (allowlisted domains only)
|
||||
sol.memory.get(query?) // retrieve memories relevant to query
|
||||
sol.memory.set(content, category?) // save a memory note
|
||||
sol.fs.read(path) // read file from sandbox
|
||||
sol.fs.write(path, content) // write file to sandbox
|
||||
sol.fs.list(path?) // list sandbox directory
|
||||
sol.memory.get(query?) // Retrieve memories relevant to query
|
||||
sol.memory.set(content, category?) // Save a memory note
|
||||
sol.fs.read(path) // Read file from sandbox
|
||||
sol.fs.write(path, content) // Write file to sandbox
|
||||
sol.fs.list(path?) // List sandbox directory
|
||||
```
|
||||
|
||||
All `sol.*` methods are async — use `await`.
|
||||
|
||||
## memory system
|
||||
## Memory System
|
||||
|
||||
### extraction (post-response, fire-and-forget)
|
||||
### Extraction (Post-Response, Fire-and-Forget)
|
||||
|
||||
After each response, a background task sends the exchange to `ministral-3b` with a structured extraction prompt. The model returns `{"memories": [{"content": "...", "category": "preference|fact|context"}]}`. Categories are normalized via `normalize_category()` — valid categories are `preference`, `fact`, `context`; anything else falls back to `general`.
|
||||
|
||||
### storage (OpenSearch)
|
||||
### Storage (OpenSearch)
|
||||
|
||||
Each memory is a `MemoryDocument` with: `id`, `user_id`, `content`, `category`, `created_at`, `updated_at`, `source` (`"auto"` or `"script"`). The index name defaults to `sol_user_memory`. User isolation is enforced at the Rust level via `user_id` filtering on all queries.
|
||||
|
||||
### pre-response loading
|
||||
### Pre-Response Loading
|
||||
|
||||
Before generating a response, the responder loads up to 5 memories:
|
||||
|
||||
1. **Topical query** — semantic search against the trigger message
|
||||
2. **Recent backfill** — if fewer than 3 topical results, fill remaining slots with most recent memories
|
||||
1. **Topical query** — Semantic search against the trigger message
|
||||
2. **Recent backfill** — If fewer than 3 topical results, fill remaining slots with most recent memories
|
||||
|
||||
Memory notes are injected into the system prompt as a `## notes about {display_name}` block with instructions to use them naturally without mentioning their existence.
|
||||
Memory notes are injected into the system prompt (legacy path) or per-message context header (Conversations API path) as a `## notes about {display_name}` block.
|
||||
|
||||
## archive
|
||||
## Archive
|
||||
|
||||
Every message event is archived as an `ArchiveDocument` in OpenSearch:
|
||||
|
||||
- **Batch indexing** — messages are buffered and flushed periodically (`opensearch.batch_size` default 50, `opensearch.flush_interval_ms` default 2000)
|
||||
- **Embedding pipeline** — configurable via `opensearch.embedding_pipeline` for semantic search
|
||||
- **Batch indexing** — Messages are buffered and flushed periodically (`opensearch.batch_size` default 50, `opensearch.flush_interval_ms` default 2000)
|
||||
- **Embedding pipeline** — Configurable via `opensearch.embedding_pipeline` for semantic search
|
||||
- **Edit tracking** — `m.replace` events update the original document's content
|
||||
- **Redaction** — `m.room.redaction` sets `redacted: true` on the original
|
||||
- **Reactions** — `m.reaction` events append `{sender, emoji, timestamp}` to the document's reactions array
|
||||
- **Backfill** — on startup, conversation context is backfilled from the archive; reactions are backfilled from Matrix room timelines (last 500 events per room)
|
||||
- **Backfill** — On startup, conversation context is backfilled from the archive; reactions are backfilled from Matrix room timelines (last 500 events per room)
|
||||
|
||||
## agent architecture
|
||||
## Agent Architecture
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> CheckMemory: startup
|
||||
CheckMemory --> CheckServer: agent_id in SQLite?
|
||||
CheckMemory --> SearchByName: not in SQLite
|
||||
[*] --> CheckHash: Startup
|
||||
CheckHash --> Restore: Hash matches
|
||||
CheckHash --> Recreate: Hash changed
|
||||
|
||||
CheckServer --> Ready: exists on Mistral server
|
||||
CheckServer --> SearchByName: gone from server
|
||||
Restore --> CheckServer: Agent ID in SQLite
|
||||
Restore --> SearchByName: Not in SQLite
|
||||
|
||||
SearchByName --> Ready: found by name
|
||||
SearchByName --> Create: not found
|
||||
CheckServer --> Ready: Exists on Mistral server
|
||||
CheckServer --> SearchByName: Gone from server
|
||||
|
||||
SearchByName --> Recreate: Not found
|
||||
Recreate --> Ready: Agent created, conversations reset, *sneezes*
|
||||
|
||||
Create --> Ready: agent created
|
||||
Ready --> [*]
|
||||
```
|
||||
|
||||
### orchestrator
|
||||
### Orchestrator
|
||||
|
||||
The orchestrator agent carries Sol's full personality (system prompt) plus all 5 tool definitions converted to `AgentTool` format. It's created on startup if `agents.use_conversations_api` is enabled. Temperature: 0.5.
|
||||
The orchestrator agent carries Sol's full personality (system prompt) plus all tool definitions converted to `AgentTool` format, including Mistral's built-in `web_search`. It's created on startup if `agents.use_conversations_api` is enabled.
|
||||
|
||||
### domain agents (8 definitions)
|
||||
When the system prompt changes, the instructions hash detects staleness and the agent is automatically recreated. All existing conversations are reset and Sol sneezes into all rooms to signal the context reset.
|
||||
|
||||
| Agent | Domain |
|
||||
|-------|--------|
|
||||
| `sol-observability` | Metrics, logs, dashboards, alerts (Prometheus, Loki, Grafana) |
|
||||
| `sol-data` | Full-text search, object storage (OpenSearch, SeaweedFS) |
|
||||
| `sol-devtools` | Git repos, issues, PRs, kanban boards (Gitea, Planka) |
|
||||
| `sol-infrastructure` | Kubernetes, deployments, certificates, builds |
|
||||
| `sol-identity` | User accounts, sessions, OAuth2 (Kratos, Hydra) |
|
||||
| `sol-collaboration` | Contacts, documents, meetings, files, email, calendars (La Suite) |
|
||||
| `sol-communication` | Chat rooms, messages, members (Matrix) |
|
||||
| `sol-media` | Video/audio rooms, recordings, streams (LiveKit) |
|
||||
### Domain Agents
|
||||
|
||||
Domain agents are defined in `agents/definitions.rs` as `DOMAIN_AGENTS` (name, description, instructions). Temperature: 0.3.
|
||||
Domain agents are defined in `agents/definitions.rs` as `DOMAIN_AGENTS` (name, description, instructions). The delegation section in the orchestrator's instructions is built dynamically — only agents that are actually registered appear.
|
||||
|
||||
### ToolBridge
|
||||
### User Impersonation
|
||||
|
||||
`tools/bridge.rs` provides a generic async handler map (`ToolBridge`) for mapping Mistral tool call names to handler functions. This is scaffolding for future SDK-based tool integration where domain agents will have their own tool sets.
|
||||
Sol authenticates to OpenBao via Kubernetes auth (role `sol-agent`) and stores per-user PATs at `secret/sol-tokens/{localpart}/{service}`. The `service_users` SQLite table maps Matrix localparts to service-specific usernames, handling cases where OIDC auto-registration produces different names.
|
||||
|
||||
## persistence
|
||||
For Gitea, PATs are auto-provisioned via the admin API on first use. The username is discovered by direct match or email-based search.
|
||||
|
||||
## Persistence
|
||||
|
||||
SQLite database at `/data/sol.db` (configurable via `matrix.db_path`), WAL mode.
|
||||
|
||||
### tables
|
||||
### Tables
|
||||
|
||||
**conversations** — room_id (PK), conversation_id, estimated_tokens, created_at
|
||||
- **conversations** — room_id (PK), conversation_id, estimated_tokens, created_at
|
||||
- **agents** — name (PK), agent_id, model, instructions_hash, created_at
|
||||
- **service_users** — (localpart, service) PK, service_username, created_at
|
||||
|
||||
**agents** — name (PK), agent_id, model, created_at
|
||||
|
||||
### recovery behavior
|
||||
### Recovery Behavior
|
||||
|
||||
On startup, if the database fails to open:
|
||||
|
||||
@@ -298,18 +301,9 @@ On startup, if the database fails to open:
|
||||
2. Fall back to in-memory SQLite (conversations won't survive restarts)
|
||||
3. After sync loop starts, send `*sneezes*` to all joined rooms to signal the hiccup
|
||||
|
||||
## multimodal
|
||||
The same sneeze happens when the orchestrator agent is recreated due to prompt changes.
|
||||
|
||||
When an `m.image` message arrives:
|
||||
|
||||
1. Extract media source from event (`MessageType::Image`)
|
||||
2. Download bytes from Matrix media API via `matrix_sdk::media::get_media_content`
|
||||
3. Base64-encode as `data:{mime};base64,{data}` URI
|
||||
4. Pass to Mistral as `ContentPart::ImageUrl` alongside any text caption
|
||||
|
||||
Encrypted images are not supported (the `MediaSource::Encrypted` variant is skipped).
|
||||
|
||||
## configuration reference
|
||||
## Configuration Reference
|
||||
|
||||
Config is loaded from `SOL_CONFIG` (default: `/etc/sol/sol.toml`).
|
||||
|
||||
@@ -319,7 +313,7 @@ Config is loaded from `SOL_CONFIG` (default: `/etc/sol/sol.toml`).
|
||||
|-------|------|---------|-------------|
|
||||
| `homeserver_url` | string | *required* | Matrix homeserver URL |
|
||||
| `user_id` | string | *required* | Bot's Matrix user ID |
|
||||
| `state_store_path` | string | *required* | Path for Matrix SDK sqlite state |
|
||||
| `state_store_path` | string | *required* | Path for Matrix SDK SQLite state |
|
||||
| `db_path` | string | `/data/sol.db` | SQLite database for persistent state |
|
||||
|
||||
### `[opensearch]`
|
||||
@@ -353,13 +347,13 @@ Config is loaded from `SOL_CONFIG` (default: `/etc/sol/sol.toml`).
|
||||
| `spontaneous_threshold` | f32 | `0.85` | LLM relevance score to trigger spontaneous response |
|
||||
| `reaction_threshold` | f32 | `0.6` | LLM relevance score to trigger emoji reaction |
|
||||
| `reaction_enabled` | bool | `true` | Enable emoji reactions |
|
||||
| `room_context_window` | usize | `30` | Messages to keep in group room context |
|
||||
| `dm_context_window` | usize | `100` | Messages to keep in DM context |
|
||||
| `room_context_window` | usize | `200` | Messages to keep in group room context |
|
||||
| `dm_context_window` | usize | `200` | Messages to keep in DM context |
|
||||
| `backfill_on_join` | bool | `true` | Backfill context from archive on startup |
|
||||
| `backfill_limit` | usize | `10000` | Max messages to backfill |
|
||||
| `instant_responses` | bool | `false` | Skip response delays (for testing) |
|
||||
| `cooldown_after_response_ms` | u64 | `15000` | Cooldown before another spontaneous response |
|
||||
| `evaluation_context_window` | usize | `25` | Recent messages sent to evaluation LLM |
|
||||
| `evaluation_context_window` | usize | `200` | Recent messages sent to evaluation LLM |
|
||||
| `detect_sol_in_conversation` | bool | `true` | Use active/passive evaluation prompts based on Sol's participation |
|
||||
| `evaluation_prompt_active` | string? | *(built-in)* | Custom prompt when Sol is in conversation |
|
||||
| `evaluation_prompt_passive` | string? | *(built-in)* | Custom prompt when Sol hasn't spoken |
|
||||
@@ -377,76 +371,76 @@ Config is loaded from `SOL_CONFIG` (default: `/etc/sol/sol.toml`).
|
||||
| `compaction_threshold` | u32 | `118000` | Token estimate before conversation reset (~90% of 131K context) |
|
||||
| `use_conversations_api` | bool | `false` | Enable Conversations API path (vs legacy chat completions) |
|
||||
|
||||
## environment variables
|
||||
### `[vault]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `url` | string | `http://openbao.data.svc.cluster.local:8200` | OpenBao/Vault URL |
|
||||
| `role` | string | `sol-agent` | Kubernetes auth role name |
|
||||
| `mount` | string | `secret` | KV v2 mount path |
|
||||
|
||||
### `[services.gitea]`
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `url` | string | *required if enabled* | Gitea API base URL |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `SOL_MATRIX_ACCESS_TOKEN` | yes | Matrix access token |
|
||||
| `SOL_MATRIX_DEVICE_ID` | yes | Matrix device ID (for E2EE state) |
|
||||
| `SOL_MISTRAL_API_KEY` | yes | Mistral API key |
|
||||
| `SOL_CONFIG` | no | Config file path (default: `/etc/sol/sol.toml`) |
|
||||
| `SOL_SYSTEM_PROMPT` | no | System prompt file path (default: `/etc/sol/system_prompt.md`) |
|
||||
| `SOL_MATRIX_ACCESS_TOKEN` | Yes | Matrix access token |
|
||||
| `SOL_MATRIX_DEVICE_ID` | Yes | Matrix device ID (for E2EE state) |
|
||||
| `SOL_MISTRAL_API_KEY` | Yes | Mistral API key |
|
||||
| `SOL_GITEA_ADMIN_USERNAME` | No | Gitea admin username (enables devtools agent) |
|
||||
| `SOL_GITEA_ADMIN_PASSWORD` | No | Gitea admin password |
|
||||
| `SOL_CONFIG` | No | Config file path (default: `/etc/sol/sol.toml`) |
|
||||
| `SOL_SYSTEM_PROMPT` | No | System prompt file path (default: `/etc/sol/system_prompt.md`) |
|
||||
|
||||
## dependencies
|
||||
## Dependencies
|
||||
|
||||
sol talks to five external services:
|
||||
Sol talks to five external services:
|
||||
|
||||
- **Matrix homeserver** — [tuwunel](https://github.com/tulir/tuwunel) (or any Matrix server)
|
||||
- **OpenSearch** — message archive + user memory indices
|
||||
- **Mistral AI** — response generation, engagement evaluation, memory extraction, agents + web search
|
||||
- **OpenBao** — secure token storage for user impersonation PATs (K8s auth, KV v2)
|
||||
- **Gitea** — git hosting API for devtools agent (repos, issues, PRs)
|
||||
- **Matrix homeserver** — [Tuwunel](https://github.com/tulir/tuwunel) (or any Matrix server)
|
||||
- **OpenSearch** — Message archive + user memory indices
|
||||
- **Mistral AI** — Response generation, engagement evaluation, memory extraction, agents + web search
|
||||
- **OpenBao** — Secure token storage for user impersonation PATs (K8s auth, KV v2)
|
||||
- **Gitea** — Git hosting API for devtools agent (repos, issues, PRs)
|
||||
|
||||
key crates: `matrix-sdk` 0.9 (E2EE + sqlite), `mistralai-client` 1.1.0 (private registry), `opensearch` 2, `deno_core` 0.393, `rusqlite` 0.32 (bundled), `ruma` 0.12.
|
||||
Key crates: `matrix-sdk` 0.9 (E2EE + SQLite), `mistralai-client` 1.1.0 (private registry), `opensearch` 2, `deno_core` 0.393, `rusqlite` 0.32 (bundled), `ruma` 0.12.
|
||||
|
||||
## building
|
||||
## Building
|
||||
|
||||
```sh
|
||||
cargo build --release
|
||||
```
|
||||
|
||||
docker (cross-compile to x86_64 linux, vendored deps):
|
||||
Docker (cross-compile to x86_64 Linux, vendored deps):
|
||||
|
||||
```sh
|
||||
docker build -t sol .
|
||||
```
|
||||
|
||||
production build + deploy:
|
||||
Production build + deploy:
|
||||
|
||||
```sh
|
||||
sunbeam build sol --push --deploy
|
||||
```
|
||||
|
||||
the Dockerfile uses a two-stage build: deps layer (cached until Cargo.toml/vendor change) → source layer (only sol code recompiles). final image is `gcr.io/distroless/cc-debian12:nonroot`.
|
||||
The Dockerfile uses a two-stage build: deps layer (cached until Cargo.toml/vendor change) → source layer (only Sol code recompiles). Final image is `gcr.io/distroless/cc-debian12:nonroot`.
|
||||
|
||||
## testing
|
||||
## Testing
|
||||
|
||||
```sh
|
||||
cargo test
|
||||
```
|
||||
|
||||
unit tests covering:
|
||||
## License
|
||||
|
||||
- config parsing (minimal, full, missing sections/fields, services, vault)
|
||||
- conversation windowing, context management, reset_all, delete_all
|
||||
- engagement rules (mention, DM, name invocation, case sensitivity, false positives)
|
||||
- personality template substitution (date, room, members, memory notes, timestamps, room context rules)
|
||||
- memory document serialization, extraction parsing, category normalization
|
||||
- archive search query building (filters, date ranges, wildcards, room_name keyword field)
|
||||
- TypeScript transpilation (basic, arrow, interface, invalid)
|
||||
- sandbox path isolation (traversal, symlink escape, nested dirs)
|
||||
- deno_core script execution (basic math, output capture)
|
||||
- SQLite CRUD (conversations, agents, service_users, load_all, bulk delete)
|
||||
- conversation message merging (DM, group, empty, single)
|
||||
- context derivation (`@user:server` → `user@server`, localpart extraction)
|
||||
- tool bridge registration and execution
|
||||
- agent UX formatting (tool calls, result truncation)
|
||||
- agent definitions (orchestrator instructions, dynamic delegation, deterministic hash)
|
||||
- token expiry validation (PAT, future, past, malformed, null)
|
||||
- Gitea API type deserialization (repos, issues, PRs, files)
|
||||
- PAT conflict status codes and scope constants
|
||||
- username mapping (OIDC → service identity)
|
||||
Sol is dual-licensed:
|
||||
|
||||
## license
|
||||
- **Open Source** — [GNU Affero General Public License v3.0](LICENSE) (AGPL-3.0-or-later). You can use, modify, and distribute Sol freely under the AGPL. If you run a modified version as a network service, you must share your changes under the same license.
|
||||
|
||||
[MIT](LICENSE)
|
||||
- **Commercial** — For organizations that want to use Sol without AGPL obligations (private modifications, proprietary integrations, no source disclosure), a commercial license is available. The commercial license grants unlimited internal use but does not permit redistribution.
|
||||
|
||||
For commercial licensing or other licensing questions, contact [hello@sunbeam.pt](mailto:hello@sunbeam.pt).
|
||||
|
||||
Reference in New Issue
Block a user