docs: add architectural overview — What's In The Box, Babe? 💅

Full tour of the SBBB stack: Pingora proxy, Ory identity, La Suite
apps, Linkerd mesh, OpenBao secrets, data layer, monitoring, Matrix,
Sol☀️, and the platform itself.
This commit is contained in:
2026-03-24 11:45:56 +00:00
parent e1fbaa445d
commit 041ef98b65

269
docs/the-box.md Normal file
View File

@@ -0,0 +1,269 @@
# What's In The Box, Babe? 💅
Welcome to the full architectural tour of **The Super Boujee Business Box ✨** — SBBB✨.
This is everything three queer women need to run a game studio: email, docs, drive, video calls, project management, chat, git, AI, monitoring, and secrets management — all on one gorgeous European server, behind one login, on one bill. No Microsoft. No Google. No landlords.
We built this because we wanted infrastructure that slaps. Let's walk through the house.
---
## The Front Door — Pingora Proxy
Every request to `*.sunbeam.pt` walks through our custom Rust binary, built on [Cloudflare's Pingora framework](https://github.com/cloudflare/pingora). Fast, tiny, and doesn't need nginx.
- **TLS termination** via rustls — pure Rust, no BoringSSL, no OpenSSL, no C dependency drama
- **Host-prefix routing**: we split on the first dot of the subdomain and map it to a backend. That's it. 34 routes covering every service — `docs`, `meet`, `drive`, `mail`, `messages`, `people`, `find`, `src`, `auth`, `cal`, `projects`, `s3`, `vault`, `search`, `metrics`, `livekit`, and more
- **ML-powered threat detection**: a decision tree + MLP ensemble (~4KB total) that fits in L1 cache, scoring every request for DDoS patterns and scanner fingerprints
- **Rate limiting**: 200 burst / 50 req/s for authenticated users, 50 burst / 10 req/s for strangers. Behave or get bounced
- **Static file serving** built in — replaces all those nginx sidecar containers we used to run
- **WebSocket passthrough** for Docs CRDT sync, Meet video streams, and Matrix chat
- **TLS hot-reload** via a Kubernetes Secret watcher — certificate rotates, proxy picks it up, zero downtime
For the full proxy deep dive, see [proxy.md](proxy.md).
---
## The Velvet Rope — Identity & Auth
One login to rule them all. We use the **Ory** stack — small, fast Go binaries, no JVM in sight.
- **Ory Kratos** handles identity management: registration, login, profile editing, account recovery
- **Ory Hydra** is our OAuth2 / OIDC provider — issues tokens, manages the client registry
- Every app registers as an OIDC client via `HydraOAuth2Client` CRDs. **Hydra Maester** watches those CRDs and automatically creates Kubernetes Secrets containing `CLIENT_ID` and `CLIENT_SECRET`
- **12 registered clients**: Docs, Drive, Meet, Messages, People, Find, Gitea, Calendars, Projects, Hive, Tuwunel, Grafana
- **Session lifespan**: 720 hours (30 days), cookie-scoped to the parent domain
- **Self-service auth methods**: password, TOTP, WebAuthn, lookup secrets
- **Identity schemas**: `employee` (us), `default`, `external` (contractors, collaborators)
The auth flow is clean:
> User hits app → 302 redirect to `auth.DOMAIN` → Hydra → Kratos login UI → token issued → 302 back to app → session established
For the full identity deep dive, see [identity.md](identity.md).
---
## The Apps — La Suite Numérique
All our productivity apps are built on [La Suite Numérique](https://lasuite.numerique.gouv.fr/), the French government's open-source office suite. Every app follows the same gorgeous pattern:
- **Django backend** (Gunicorn on port 8000)
- **React / Next.js frontend** (nginx on port 80)
- **PostgreSQL** for persistence
- **S3** for object storage
- **OIDC auth** via `mozilla-django-oidc`
### Docs — `docs.DOMAIN`
Collaborative document editing — GDDs, specs, meeting notes, everything we write together.
- **Editor**: BlockNote (built on Tiptap), with real-time sync via Y.js CRDT over WebSocket
- **AI features**: rephrase, summarize, translate, fix typos — powered by BlockNote XL
- **Office rendering**: Collabora (LibreOffice Online, WOPI server on port 9980)
- **Export**: `.odt`, `.docx`, `.pdf`
- **Storage**: `sunbeam-docs` bucket
### Drive — `drive.DOMAIN`
File storage with versioning, smart organization, and granular access control.
- **WOPI integration** with Docs — open and edit files directly from Drive
- **S3 versioning** enabled, with `VersionId` exposed via the `X-WOPI-ItemVersion` header
- **Storage**: `sunbeam-drive` bucket
### Mail — `mail.DOMAIN`
A full email platform — personal mailboxes, shared mailboxes, the works.
- **Inbound flow**: Internet → MX record → Postfix MTA-in → Rspamd (spam filtering) → Django MDA → PostgreSQL + OpenSearch
- **Outbound flow**: Django → Postfix MTA-out (DKIM signing) → Scaleway TEM relay
- **AI features**: thread summaries, compose assistance, auto-labeling
- **DNS**: MX, SPF, DKIM, DMARC, PTR — all the acronyms that keep us out of spam folders
- **Storage**: `sunbeam-messages` + `sunbeam-messages-imports` buckets
- **Full-text search** via OpenSearch
### Meet — `meet.DOMAIN`
Video and audio conferencing — standups, playtests, partner calls.
- **Media server**: LiveKit (self-hosted, Apache 2.0) with WebRTC + DTLS-SRTP encryption
- **TURN relay**: LiveKit's built-in TURN server, UDP ports 4915249252 forwarded through Pingora
- **Async tasks**: Celery worker
### Calendar — `cal.DOMAIN`
Scheduling with team availability views.
- **CalDAV server** for external client sync (Apple Calendar, Thunderbird, etc.)
- **Celery worker** for notifications
- Django backend + Next.js frontend
### Projects — `projects.DOMAIN`
Kanban boards for tasks, documents, and databases.
- **WebSocket-enabled** for real-time collaborative updates
### People — `people.DOMAIN`
Centralized user and team management — the admin panel for our little org.
- Propagates permissions across all La Suite apps
- Syncs with the Messages email backend for mailbox provisioning
- Admin-facing, not daily-use
### Integration Navbar
All apps share a navigation bar via the `@gouvfr-lasuite/integration` npm package — Sunbeam-branded logo, colors, and nav links. One consistent chrome across the whole suite.
---
## The Meshy Bits — Linkerd Service Mesh
Every pod talks to every other pod over **mutual TLS**, automatically, with zero application changes. That's Linkerd — a Rust-based service mesh that's lightweight enough for a single node.
- **Automatic mTLS** between all injected pods
- **Automatic certificate rotation** — we don't think about it
- **Per-route observability**: request rate, success rate, latency percentiles
- **~15MB per sidecar proxy**, ~300MB total across ~20 injected pods
- Injection via annotation: `linkerd.io/inject: enabled`
---
## The Vault — OpenBao Secrets
Secrets management via **OpenBao**, the open-source fork of HashiCorp Vault (post-BSL relicensing — we don't do rug pulls here).
- **Dynamic database credentials**: rotated every 5 minutes via Vault Secrets Operator. No static passwords in our Postgres, ever
- **Static secrets**: OIDC client keys, Django secret keys, DKIM keys — refreshed every 30 seconds
- **Kubernetes auth method**: pods authenticate with their ServiceAccount token
- **Vault Secrets Operator** watches CRDs (`VaultStaticSecret`, `VaultDynamicSecret`) and creates Kubernetes Secrets automatically
- Exposed at `vault.DOMAIN`, gated behind Hydra `/userinfo` auth
---
## The Pantry — Data Layer
Where we keep the ingredients.
### PostgreSQL (CloudNativePG)
One cluster, eleven logical databases — we're not running eleven Postgres instances like animals.
- **Databases**: `kratos`, `hydra`, `docs`, `meet`, `drive`, `messages`, `conversations`, `people`, `gitea`, `hive`, `find`
- **Dynamic credentials** via Vault Secrets Operator (5-minute rotation)
- 10Gi storage, 512Mi memory limit
### Valkey (Redis-compatible)
A single Valkey instance at `valkey:6379` — the post-Redis-relicensing choice.
- **DB 0**: Celery broker (Messages / Conversations async tasks)
- **DB 1**: Session cache (all apps)
- 64Mi memory limit
### OpenSearch
Single-node, version 3.x.
- **Messages**: email full-text search
- **Tuwunel**: Matrix message search with hybrid neural embeddings
- **Sol☀**: archive and memory indices
- 512Mi Java heap
### SeaweedFS (Object Storage)
S3-compatible distributed storage — Apache 2.0, chosen over MinIO after their AGPL relicensing.
- **Master** (metadata), **Volume** (NVMe-backed data), **Filer** (S3 API on port 8333)
- **10 buckets**: `sunbeam-docs`, `sunbeam-meet`, `sunbeam-drive`, `sunbeam-messages`, `sunbeam-messages-imports`, `sunbeam-conversations`, `sunbeam-people`, `sunbeam-git-lfs`, `sunbeam-game-assets`, `sunbeam-ml-models`
- Exposed at `s3.DOMAIN` for dev access
### SearXNG (Metasearch)
Self-hosted metasearch engine — no rate limits, no tracking. Used by Sol☀ for web research.
---
## The Observatory — Monitoring
We see everything.
- **Prometheus**: metrics scraping at 30-second intervals, with ServiceMonitors per component
- **Grafana**: dashboards organized by namespace (10 dashboard ConfigMaps), OIDC auth via Hydra — same login as everything else
- **Loki**: log aggregation via Alloy DaemonSet
- **Tempo**: distributed tracing (OTLP)
- **AlertManager**: alert routes delivered to Matrix via `matrix-alertmanager-receiver` bot — alerts show up in chat, where we actually look
- **Alloy**: the unified collection agent for metrics, logs, and traces
- **Alert rules** per component: PostgreSQL, OpenSearch, OpenBao, Gitea, SeaweedFS, LiveKit, Linkerd, Ory, infrastructure
Exposed endpoints:
| Service | URL |
|---|---|
| Grafana | `metrics.DOMAIN` |
| Prometheus | `systemmetrics.DOMAIN` |
| Loki | `systemlogs.DOMAIN` |
| Tempo | `systemtracing.DOMAIN` |
For the full monitoring deep dive, see [monitoring.md](monitoring.md).
---
## The Workshop — DevTools
### Gitea — `src.DOMAIN`
Self-hosted Git forge with issue tracking, wiki, and CI.
- **OIDC auth** via Hydra — same login, always
- **Git LFS**: S3 backend pointing at the `sunbeam-git-lfs` SeaweedFS bucket
- **CI**: Gitea Actions — GitHub Actions compatible YAML, runs on the same box
- Replaces GitHub for private repos and eliminates LFS bandwidth costs entirely
- Game assets (textures, models, audio) flow through LFS into SeaweedFS
---
## The Chat — Matrix
### Tuwunel — Matrix Homeserver
Our Matrix homeserver lives at `messages.DOMAIN`.
- **E2EE enabled** — encrypted by default
- **SSO-only auth**: no passwords, no guest registration. You log in through Hydra or you don't log in
- **Search**: OpenSearch with hybrid neural embeddings
- **TURN**: LiveKit's built-in TURN server for voice/video calls
### Sol☀ — AI Agent
Sol☀ isn't a tool — they're a virtual employee. Lives in Matrix as a peer, participates in conversations, and gets things done.
- **Multi-model AI orchestration**: Mistral medium, large, 3B, and 8B models — right tool for the task
- **Semantic archive search**: remembers past conversations
- **Per-user memory**: knows who they're talking to and what matters to each person
- **Gitea integration**: reads code, opens issues, reviews PRs
- **Research agents**: can spin up focused research tasks
- **Sandboxed runtime**: executes JavaScript and TypeScript safely
For the full Sol☀ deep dive, see [sol.md](sol.md).
---
## The Platform
The foundation under the whole house.
- **Compute**: Single Scaleway Elastic Metal server in Paris — 64GB RAM, local NVMe storage
- **Orchestration**: k3s (single-node Kubernetes, Traefik disabled — Pingora handles ingress)
- **External services**: Scaleway Object Storage (~€510/mo), Transactional Email (~€1/mo), Generative APIs (~€15/mo)
- **DNS**: Every subdomain points to one IP. Wildcard pattern: `{app}.sunbeam.pt`
### The Principles
> **One box, one bill.** European data sovereignty. Self-hosted open source. Unified auth. Operationally minimal.
We run a game studio on a single server in Paris for less than the cost of a nice dinner. Every component is open source. Every request is encrypted. Every secret rotates. Every alert lands in our group chat. And we never, ever have to ask Microsoft for permission to do our jobs.
That's The Super Boujee Box. ✨