CLAUDE.md: updated source layout with orchestrator, grpc, code_index, breadcrumbs modules. Deployment: added gRPC service, startup flowchart, new secrets and troubleshooting. Conversations: updated lifecycle to show orchestrator path and gRPC session keys.
262 lines
7.9 KiB
Markdown
262 lines
7.9 KiB
Markdown
# Sol — Kubernetes Deployment
|
|
|
|
Sol runs as a single-replica Deployment in the `matrix` namespace. SQLite is the persistence backend, so only one pod can run at a time (Recreate strategy).
|
|
|
|
## resource relationships
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
subgraph OpenBao
|
|
vault[("secret/sol<br/>matrix-access-token<br/>matrix-device-id<br/>mistral-api-key<br/>gitea-admin-username<br/>gitea-admin-password")]
|
|
end
|
|
|
|
subgraph "matrix namespace"
|
|
vss[VaultStaticSecret<br/>sol-secrets]
|
|
secret[Secret<br/>sol-secrets]
|
|
cm[ConfigMap<br/>sol-config<br/>sol.toml + system_prompt.md]
|
|
pvc[PVC<br/>sol-data<br/>1Gi RWO]
|
|
deploy[Deployment<br/>sol]
|
|
init[initContainer<br/>fix-permissions]
|
|
pod[Container<br/>sol]
|
|
svc[Service<br/>sol-grpc<br/>port 50051]
|
|
end
|
|
|
|
vault --> |VSO sync| vss
|
|
vss --> |creates| secret
|
|
vss --> |rolloutRestartTargets| deploy
|
|
deploy --> init
|
|
init --> pod
|
|
secret --> |env vars| pod
|
|
cm --> |subPath mounts| pod
|
|
pvc --> |/data| init
|
|
pvc --> |/data| pod
|
|
svc --> |gRPC| pod
|
|
```
|
|
|
|
## manifests
|
|
|
|
All manifests are in `infrastructure/base/matrix/`.
|
|
|
|
### Deployment (`sol-deployment.yaml`)
|
|
|
|
```yaml
|
|
strategy:
|
|
type: Recreate # SQLite requires single-writer
|
|
replicas: 1
|
|
```
|
|
|
|
**initContainer** — `busybox` runs `chmod -R 777 /data && mkdir -p /data/matrix-state` to ensure the nonroot distroless container can write to the Longhorn PVC.
|
|
|
|
**Container** — `sol` image (distroless/cc-debian12:nonroot)
|
|
|
|
- Resources: 256Mi request / 512Mi limit memory, 100m CPU request
|
|
- `enableServiceLinks: false` — avoids injecting service env vars that could conflict
|
|
- Ports: 50051 (gRPC)
|
|
|
|
**Environment variables** (from Secret `sol-secrets`):
|
|
|
|
| Env Var | Secret Key |
|
|
|---------|-----------|
|
|
| `SOL_MATRIX_ACCESS_TOKEN` | `matrix-access-token` |
|
|
| `SOL_MATRIX_DEVICE_ID` | `matrix-device-id` |
|
|
| `SOL_MISTRAL_API_KEY` | `mistral-api-key` |
|
|
| `SOL_GITEA_ADMIN_USERNAME` | `gitea-admin-username` |
|
|
| `SOL_GITEA_ADMIN_PASSWORD` | `gitea-admin-password` |
|
|
|
|
Fixed env vars:
|
|
|
|
| Env Var | Value |
|
|
|---------|-------|
|
|
| `SOL_CONFIG` | `/etc/sol/sol.toml` |
|
|
| `SOL_SYSTEM_PROMPT` | `/etc/sol/system_prompt.md` |
|
|
|
|
**Volume mounts:**
|
|
|
|
| Mount | Source | Details |
|
|
|-------|--------|---------|
|
|
| `/etc/sol/sol.toml` | ConfigMap `sol-config` | subPath: `sol.toml`, readOnly |
|
|
| `/etc/sol/system_prompt.md` | ConfigMap `sol-config` | subPath: `system_prompt.md`, readOnly |
|
|
| `/data` | PVC `sol-data` | read-write |
|
|
|
|
### PVC (`sol-deployment.yaml`, second document)
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: sol-data
|
|
namespace: matrix
|
|
spec:
|
|
accessModes: [ReadWriteOnce]
|
|
resources:
|
|
requests:
|
|
storage: 1Gi
|
|
```
|
|
|
|
Uses the default StorageClass (Longhorn).
|
|
|
|
### VaultStaticSecret (`vault-secrets.yaml`)
|
|
|
|
```yaml
|
|
apiVersion: secrets.hashicorp.com/v1beta1
|
|
kind: VaultStaticSecret
|
|
metadata:
|
|
name: sol-secrets
|
|
namespace: matrix
|
|
spec:
|
|
vaultAuthRef: vso-auth
|
|
mount: secret
|
|
type: kv-v2
|
|
path: sol
|
|
refreshAfter: 60s
|
|
rolloutRestartTargets:
|
|
- kind: Deployment
|
|
name: sol
|
|
destination:
|
|
name: sol-secrets
|
|
create: true
|
|
overwrite: true
|
|
```
|
|
|
|
The `rolloutRestartTargets` field means VSO will automatically restart the Sol deployment when secrets change in OpenBao.
|
|
|
|
Five keys synced from OpenBao `secret/sol`:
|
|
|
|
- `matrix-access-token`
|
|
- `matrix-device-id`
|
|
- `mistral-api-key`
|
|
- `gitea-admin-username`
|
|
- `gitea-admin-password`
|
|
|
|
## `/data` mount layout
|
|
|
|
```
|
|
/data/
|
|
├── sol.db SQLite database (conversations, agents, service_users — WAL mode)
|
|
└── matrix-state/ Matrix SDK sqlite state store (E2EE keys, sync tokens)
|
|
```
|
|
|
|
Both are created automatically. The initContainer ensures directory permissions are correct for the nonroot container.
|
|
|
|
## secrets in OpenBao
|
|
|
|
Store secrets at `secret/sol` in OpenBao KV v2:
|
|
|
|
```sh
|
|
# Via sunbeam seed (automated), or manually:
|
|
openbao kv put secret/sol \
|
|
matrix-access-token="syt_..." \
|
|
matrix-device-id="DEVICE_ID" \
|
|
mistral-api-key="..." \
|
|
gitea-admin-username="..." \
|
|
gitea-admin-password="..."
|
|
```
|
|
|
|
These are synced to K8s Secret `sol-secrets` by the Vault Secrets Operator.
|
|
|
|
## build and deploy
|
|
|
|
```sh
|
|
# Build only (local Docker image)
|
|
sunbeam build sol
|
|
|
|
# Build + push to registry
|
|
sunbeam build sol --push
|
|
|
|
# Build + push + deploy (apply manifests + rollout restart)
|
|
sunbeam build sol --push --deploy
|
|
```
|
|
|
|
The Docker build cross-compiles to `x86_64-unknown-linux-gnu` on macOS. The final image is `gcr.io/distroless/cc-debian12:nonroot` (~30MB).
|
|
|
|
## startup sequence
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
start[Start] --> tracing[Init tracing<br/>RUST_LOG env filter]
|
|
tracing --> config[Load config + system prompt]
|
|
config --> secrets[Read env vars<br/>access token, device ID, API key]
|
|
secrets --> matrix[Build Matrix client<br/>E2EE sqlite store, restore session]
|
|
matrix --> opensearch[Connect OpenSearch<br/>ensure archive + memory + code indices]
|
|
opensearch --> mistral[Init Mistral client]
|
|
mistral --> components[Build components<br/>Personality, ConversationManager,<br/>ToolRegistry, Indexer, Evaluator]
|
|
components --> backfill[Backfill conversation context<br/>from archive]
|
|
backfill --> sqlite{Open SQLite}
|
|
sqlite --> |success| agents[Init AgentRegistry +<br/>ConversationRegistry]
|
|
sqlite --> |failure| inmemory[In-memory fallback]
|
|
inmemory --> agents
|
|
agents --> orchestrator{use_conversations_api?}
|
|
orchestrator --> |yes| ensure_agent[Ensure orchestrator agent<br/>exists on Mistral]
|
|
orchestrator --> |no| skip[Skip]
|
|
ensure_agent --> grpc{grpc config?}
|
|
skip --> grpc
|
|
grpc --> |yes| grpc_server[Start gRPC server<br/>on listen_addr]
|
|
grpc --> |no| skip_grpc[Skip]
|
|
grpc_server --> reactions[Backfill reactions<br/>from Matrix timelines]
|
|
skip_grpc --> reactions
|
|
reactions --> flush[Start background<br/>index flush task]
|
|
flush --> sync[Start Matrix sync loop]
|
|
sync --> sneeze{SQLite failed?}
|
|
sneeze --> |yes| sneeze_rooms[Send *sneezes*<br/>to all rooms]
|
|
sneeze --> |no| running[Sol is running]
|
|
sneeze_rooms --> running
|
|
```
|
|
|
|
## monitoring
|
|
|
|
Sol uses `tracing` with structured fields. Default log level: `sol=info`.
|
|
|
|
Key log events:
|
|
|
|
| Event | Level | Fields |
|
|
|-------|-------|--------|
|
|
| Response sent | info | `room`, `len`, `is_dm` |
|
|
| Tool execution | info | `tool`, `id`, `args` |
|
|
| Engagement evaluation | info | `sender`, `rule`, `relevance`, `threshold` |
|
|
| Memory extraction | debug | `count`, `user` |
|
|
| Conversation created | info | `room`, `conversation_id` |
|
|
| Agent restored/created | info | `agent_id`, `name` |
|
|
| Backfill complete | info | `rooms`, `messages` / `reactions` |
|
|
| gRPC session started | info | `session_id`, `project` |
|
|
| Code reindex complete | info | `repos_indexed`, `symbols_indexed` |
|
|
|
|
Set `RUST_LOG=sol=debug` for verbose output including tool results, evaluation prompts, and memory details.
|
|
|
|
## troubleshooting
|
|
|
|
**Pod won't start / CrashLoopBackOff:**
|
|
|
|
```sh
|
|
sunbeam logs matrix/sol
|
|
```
|
|
|
|
Common causes:
|
|
- Missing secrets (env vars not set) — check `sunbeam k8s get secret sol-secrets -n matrix -o yaml`
|
|
- ConfigMap not applied — check `sunbeam k8s get cm sol-config -n matrix`
|
|
- PVC not bound — check `sunbeam k8s get pvc -n matrix`
|
|
|
|
**SQLite recovery failure (*sneezes*):**
|
|
|
|
If Sol sends `*sneezes*` on startup, it means the SQLite database at `/data/sol.db` couldn't be opened. Sol falls back to in-memory state. Check PVC mount and file permissions:
|
|
|
|
```sh
|
|
sunbeam k8s exec -n matrix deployment/sol -- ls -la /data/
|
|
```
|
|
|
|
**Matrix sync errors:**
|
|
|
|
Sol auto-joins rooms on invite (3 retries with exponential backoff). If it can't join, check homeserver connectivity and access token validity.
|
|
|
|
**Agent creation failure:**
|
|
|
|
If the orchestrator agent can't be created, Sol falls back to model-only conversations (no agent). Check Mistral API key and quota.
|
|
|
|
**gRPC connection refused:**
|
|
|
|
If `sunbeam code` can't connect, verify the gRPC server is configured and listening:
|
|
|
|
```sh
|
|
sunbeam k8s get svc sol-grpc -n matrix
|
|
sunbeam logs matrix/sol | grep grpc
|
|
```
|