Files
sol/docs/deployment.md

229 lines
6.8 KiB
Markdown

# Sol — Kubernetes Deployment
Sol runs as a single-replica Deployment in the `matrix` namespace. SQLite is the persistence backend, so only one pod can run at a time (Recreate strategy).
## resource relationships
```mermaid
flowchart TD
subgraph OpenBao
vault[("secret/sol<br/>matrix-access-token<br/>matrix-device-id<br/>mistral-api-key")]
end
subgraph "matrix namespace"
vss[VaultStaticSecret<br/>sol-secrets]
secret[Secret<br/>sol-secrets]
cm[ConfigMap<br/>sol-config<br/>sol.toml + system_prompt.md]
pvc[PVC<br/>sol-data<br/>1Gi RWO]
deploy[Deployment<br/>sol]
init[initContainer<br/>fix-permissions]
pod[Container<br/>sol]
end
vault --> |VSO sync| vss
vss --> |creates| secret
vss --> |rolloutRestartTargets| deploy
deploy --> init
init --> pod
secret --> |env vars| pod
cm --> |subPath mounts| pod
pvc --> |/data| init
pvc --> |/data| pod
```
## manifests
All manifests are in `infrastructure/base/matrix/`.
### Deployment (`sol-deployment.yaml`)
```yaml
strategy:
type: Recreate # SQLite requires single-writer
replicas: 1
```
**initContainer**`busybox` runs `chmod -R 777 /data && mkdir -p /data/matrix-state` to ensure the nonroot distroless container can write to the Longhorn PVC.
**Container**`sol` image (distroless/cc-debian12:nonroot)
- Resources: 256Mi request / 512Mi limit memory, 100m CPU request
- `enableServiceLinks: false` — avoids injecting service env vars that could conflict
**Environment variables** (from Secret `sol-secrets`):
| Env Var | Secret Key |
|---------|-----------|
| `SOL_MATRIX_ACCESS_TOKEN` | `matrix-access-token` |
| `SOL_MATRIX_DEVICE_ID` | `matrix-device-id` |
| `SOL_MISTRAL_API_KEY` | `mistral-api-key` |
Fixed env vars:
| Env Var | Value |
|---------|-------|
| `SOL_CONFIG` | `/etc/sol/sol.toml` |
| `SOL_SYSTEM_PROMPT` | `/etc/sol/system_prompt.md` |
**Volume mounts:**
| Mount | Source | Details |
|-------|--------|---------|
| `/etc/sol/sol.toml` | ConfigMap `sol-config` | subPath: `sol.toml`, readOnly |
| `/etc/sol/system_prompt.md` | ConfigMap `sol-config` | subPath: `system_prompt.md`, readOnly |
| `/data` | PVC `sol-data` | read-write |
### PVC (`sol-deployment.yaml`, second document)
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sol-data
namespace: matrix
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 1Gi
```
Uses the default StorageClass (Longhorn).
### VaultStaticSecret (`vault-secrets.yaml`)
```yaml
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: sol-secrets
namespace: matrix
spec:
vaultAuthRef: vso-auth
mount: secret
type: kv-v2
path: sol
refreshAfter: 60s
rolloutRestartTargets:
- kind: Deployment
name: sol
destination:
name: sol-secrets
create: true
overwrite: true
```
The `rolloutRestartTargets` field means VSO will automatically restart the Sol deployment when secrets change in OpenBao.
Three keys synced from OpenBao `secret/sol`:
- `matrix-access-token`
- `matrix-device-id`
- `mistral-api-key`
## `/data` mount layout
```
/data/
├── sol.db SQLite database (conversations + agents tables, WAL mode)
└── matrix-state/ Matrix SDK sqlite state store (E2EE keys, sync tokens)
```
Both are created automatically. The initContainer ensures directory permissions are correct for the nonroot container.
## secrets in OpenBao
Store secrets at `secret/sol` in OpenBao KV v2:
```sh
# Via sunbeam seed (automated), or manually:
openbao kv put secret/sol \
matrix-access-token="syt_..." \
matrix-device-id="DEVICE_ID" \
mistral-api-key="..."
```
These are synced to K8s Secret `sol-secrets` by the Vault Secrets Operator.
## build and deploy
```sh
# Build only (local Docker image)
sunbeam build sol
# Build + push to registry
sunbeam build sol --push
# Build + push + deploy (apply manifests + rollout restart)
sunbeam build sol --push --deploy
```
The Docker build cross-compiles to `x86_64-unknown-linux-gnu` on macOS. The final image is `gcr.io/distroless/cc-debian12:nonroot` (~30MB).
## startup sequence
1. Initialize `tracing_subscriber` with `RUST_LOG` env filter (default: `sol=info`)
2. Load config from `SOL_CONFIG` path
3. Load system prompt from `SOL_SYSTEM_PROMPT` path
4. Read 3 secret env vars (`SOL_MATRIX_ACCESS_TOKEN`, `SOL_MATRIX_DEVICE_ID`, `SOL_MISTRAL_API_KEY`)
5. Build Matrix client with E2EE sqlite store, restore session
6. Connect to OpenSearch, ensure archive + memory indices exist
7. Initialize Mistral client
8. Build components: Personality, ConversationManager, ToolRegistry, Indexer, Evaluator, Responder
9. Backfill conversation context from archive (if `backfill_on_join` enabled)
10. Open SQLite database (fallback to in-memory on failure)
11. Initialize AgentRegistry + ConversationRegistry (load persisted state from SQLite)
12. If `use_conversations_api` enabled: ensure orchestrator agent exists on Mistral server
13. Backfill reactions from Matrix room timelines
14. Start background index flush task
15. Start Matrix sync loop
16. If SQLite failed: send `*sneezes*` to all joined rooms
17. Log "Sol is running", wait for SIGINT
## monitoring
Sol uses `tracing` with structured fields. Default log level: `sol=info`.
Key log events:
| Event | Level | Fields |
|-------|-------|--------|
| Response sent | info | `room`, `len`, `is_dm` |
| Tool execution | info | `tool`, `id`, `args` |
| Engagement evaluation | info | `sender`, `rule`, `relevance`, `threshold` |
| Memory extraction | debug | `count`, `user` |
| Conversation created | info | `room`, `conversation_id` |
| Agent restored/created | info | `agent_id`, `name` |
| Backfill complete | info | `rooms`, `messages` / `reactions` |
Set `RUST_LOG=sol=debug` for verbose output including tool results, evaluation prompts, and memory details.
## troubleshooting
**Pod won't start / CrashLoopBackOff:**
```sh
sunbeam logs matrix/sol
```
Common causes:
- Missing secrets (env vars not set) — check `sunbeam k8s get secret sol-secrets -n matrix -o yaml`
- ConfigMap not applied — check `sunbeam k8s get cm sol-config -n matrix`
- PVC not bound — check `sunbeam k8s get pvc -n matrix`
**SQLite recovery failure (*sneezes*):**
If Sol sends `*sneezes*` on startup, it means the SQLite database at `/data/sol.db` couldn't be opened. Sol falls back to in-memory state. Check PVC mount and file permissions:
```sh
sunbeam k8s exec -n matrix deployment/sol -- ls -la /data/
```
**Matrix sync errors:**
Sol auto-joins rooms on invite (3 retries with exponential backoff). If it can't join, check homeserver connectivity and access token validity.
**Agent creation failure:**
If the orchestrator agent can't be created, Sol falls back to model-only conversations (no agent). Check Mistral API key and quota.