Files
sbbb/docs/archive/system-design.md
Sienna Meridian Satterwhite 330d0758ff docs: archive system-design.md — replaced by new documentation suite
Moved to docs/archive/ as historical reference. All content has been
merged into the new boujee documentation.
2026-03-24 11:48:28 +00:00

42 KiB
Raw Permalink Blame History

Sunbeam Studio — Infrastructure Design Document

Version: 0.1.0-draft Date: 2026-02-28 Author: Sienna Satterthwaite, Chief Engineer Status: Planning


1. Overview

Sunbeam is a three-person game studio founded by Sienna, Lonni, and Amber. This document describes the self-hosted collaboration and development infrastructure that supports studio operations — document editing, video calls, email, version control, AI tooling, and game asset management.

Guiding principles:

  • One box, one bill. Single Scaleway Elastic Metal server in Paris. No multi-vendor sprawl.
  • European data sovereignty. All data resides in France, GDPR-compliant by default.
  • Self-hosted, open source. No per-seat SaaS fees. MIT-licensed where possible.
  • Consistent experience. Unified authentication, shared design language, single login across all tools.
  • Operationally honest. The stack is architecturally rich but the operational surface is small: three users, one node, one cluster.

2. Platform

2.1 Compute

Property Value
Provider Scaleway Elastic Metal
Region Paris (PAR1/PAR2)
RAM 64 GB minimum
Storage Local NVMe (k3s + OS + SeaweedFS volumes)
Network Public IPv4, configurable reverse DNS

2.2 Orchestration

k3s — single-node Kubernetes. Traefik disabled at install (replaced by custom Pingora proxy):

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable=traefik" sh -

2.3 External Scaleway Services

Service Purpose Estimated Cost
Object Storage PostgreSQL backups (barman), cold asset overflow ~€510/mo
Transactional Email (TEM) Outbound SMTP relay for notifications ~€1/mo
Generative APIs AI inference for all La Suite components ~€15/mo

3. Namespace Layout

k3s cluster
├── ory/            Identity & auth (Kratos, Hydra, Login UI)
├── lasuite/        Docs, Meet, Drive, Messages, Conversations, People, Hive
├── media/          LiveKit server + TURN
├── storage/        SeaweedFS (master, volume, filer)
├── data/           CloudNativePG, Redis, OpenSearch
├── devtools/       Gitea
├── mesh/           Linkerd control plane
└── ingress/        Pingora edge proxy

4. Core Infrastructure

4.1 Authentication — Ory Kratos + Hydra

Replaces the Keycloak default from La Suite's French government deployments. No JVM, no XML — lightweight Go binaries that fit k3s cleanly.

Component Role
Kratos Identity management (registration, login, profile, recovery)
Hydra OAuth2 / OpenID Connect provider
Login UI Sunbeam-branded login and consent pages

Every La Suite app authenticates via mozilla-django-oidc. Each app registers as an OIDC client in Hydra with a client ID, secret, and redirect URI. Swapping Keycloak for Hydra is transparent at the app level.

Auth flow:

User → any *.sunbeam.pt app
  → 302 to auth.sunbeam.pt
  → Hydra → Kratos login UI
  → authenticate
  → Hydra issues OIDC token
  → 302 back to app
  → app validates via mozilla-django-oidc
  → session established

4.2 Database — CloudNativePG

Single PostgreSQL cluster via CloudNativePG operator. One cluster, multiple logical databases:

PostgreSQL (CloudNativePG)
├── kratos_db
├── hydra_db
├── docs_db
├── meet_db
├── drive_db
├── messages_db
├── conversations_db
├── people_db
├── gitea_db
└── hive_db

4.3 Object Storage — SeaweedFS

S3-compatible distributed storage. Apache 2.0 licensed (chosen over MinIO post-AGPL relicensing).

Components: master (metadata/topology), volume servers (data on local NVMe), filer (S3 API gateway).

S3 endpoint: http://seaweedfs-filer.storage.svc:8333 (cluster-internal). For local dev access outside the cluster, expose via ingress at s3.sunbeam.pt or kubectl port-forward.

Buckets:

Bucket Consumer Contents
sunbeam-docs Docs Document content, images, exports
sunbeam-meet Meet Recordings (if enabled)
sunbeam-drive Drive Uploaded/shared files
sunbeam-messages Messages Email attachments
sunbeam-conversations Conversations Chat attachments
sunbeam-git-lfs Gitea Git LFS objects (game assets)
sunbeam-game-assets Hive Game assets synced between Drive and S3

4.4 Cache — Redis

Single Redis instance in data namespace. Shared by Messages (Celery broker), Conversations (session/cache), Meet (LiveKit ephemeral state).

4.5 Search — OpenSearch

Required by Messages for full-text email search. Single-node deployment in data namespace.

4.6 Edge Proxy — Pingora (Custom Rust Binary)

Custom proxy built on Cloudflare's Pingora framework. A few hundred lines of Rust handling:

  • HTTPS termination — Let's Encrypt certs via rustls-acme compiled into the proxy binary
  • Hostname routing — static mapping of *.sunbeam.pt hostnames to backend ClusterIP:port
  • WebSocket passthrough — LiveKit signaling (Meet), Y.js CRDT sync (Docs)
  • Raw UDP forwarding — TURN relay ports (3478 + 4915249252). Forwards bytes, not protocol. LiveKit handles TURN/STUN internally per RFC 5766. 100 relay ports is vastly more than three users need.

Seven hostnames, rarely changes. No dynamic service discovery required.

4.7 Service Mesh — Linkerd

mTLS between all pods with zero application changes. Sidecar injection provides:

  • Mutual TLS on all internal east-west traffic
  • Automatic certificate rotation
  • Per-route observability (request rate, success rate, latency)

Rust-based data plane — lightweight on a single node.


5. La Suite Numérique Applications

All La Suite apps share a common pattern: Django backend, React frontend, PostgreSQL, S3 storage, OIDC auth. Independent services, not a monolith.

5.1 Docs — docs.sunbeam.pt

Collaborative document editing. GDD, lore bibles, specs, meeting notes.

Property Detail
Editor BlockNote (Tiptap-based)
Realtime Y.js CRDT over WebSocket
AI BlockNote XL AI extension — rephrase, summarize, translate, fix typos, freeform prompts. Available via formatting toolbar and /ai slash command.
Export .odt, .docx, .pdf

BlockNote XL packages (AI, PDF export) are GPL-licensed. Fine for internal use — GPL triggers on distribution, not deployment.

5.2 Meet — meet.sunbeam.pt

Video conferencing. Standups, playtests, partner calls.

Property Detail
Backend LiveKit (self-hosted, Apache 2.0)
Media DTLS-SRTP encrypted WebRTC
TURN LiveKit built-in, UDP ports exposed through Pingora

5.3 Drive — drive.sunbeam.pt

File sharing and document management. Game assets, reference material, shared resources.

Granular access control, workspace organization, linked to Messages for email attachments and Docs for file references.

5.4 Messages — mail.sunbeam.pt

Full email platform with team and personal mailboxes.

Architecture:

Inbound:  Internet → MX → Pingora → Postfix MTA-in → Rspamd → Django MDA → Postgres + OpenSearch
Outbound: User → Django → Postfix MTA-out (DKIM) → Scaleway TEM relay → recipient

Mailboxes:

  • Personal: sienna@, lonni@, amber@sunbeam.pt
  • Shared: hello@sunbeam.pt (all three see incoming business email)

AI features: Thread summaries, compose assistance, auto-labelling.

Limitation: No IMAP/POP3 — web UI only. Deliberate upstream design choice. Acceptable for a three-person studio living in the browser.

DNS requirements: MX, SPF, DKIM, DMARC, PTR (reverse DNS configurable in Scaleway console).

5.5 Conversations — chat.sunbeam.pt

AI chatbot / team assistant.

Property Detail
AI Framework Pydantic AI (backend), Vercel AI SDK (frontend streaming)
Tools Extensible agent tools — wire into Docs search, Drive queries, Messages summaries
Attachments PDF and image upload for analysis
Helm Official chart at suitenumerique.github.io/conversations/

Primary force multiplier. Custom tools can search GDD content, query shared files, and summarize email threads.

5.6 People — people.sunbeam.pt

Centralized user and team management. Creates users/teams and propagates permissions across all La Suite apps. Interoperates with dimail (Messages email backend) for mailbox provisioning.

Admin-facing, not a daily-use interface.

5.7 La Suite Integration Layer

Apps share a unified experience through:

  • @gouvfr-lasuite/integration — npm package providing the shared navigation bar, header, branding. Fork/configure for Sunbeam logo, colors, and nav links.
  • lasuite-django — shared Python library for OIDC helpers and common Django patterns.
  • Per-app env vars for branding: DJANGO_EMAIL_BRAND_NAME=Sunbeam, DJANGO_EMAIL_LOGO_IMG, etc.

6. Development Tools

6.1 Gitea — src.sunbeam.pt

Self-hosted Git with issue tracking, wiki, and CI.

Property Detail
Runtime Single Go binary
Auth OIDC via Hydra (same login as everything else)
LFS Built-in Git LFS, S3 backend → SeaweedFS sunbeam-git-lfs bucket
CI Gitea Actions (GitHub Actions compatible YAML). Lightweight jobs: compiles, tests, linting. Platform-specific builds offloaded to external providers.
Theming custom/ directory for Sunbeam logo, colors, CSS

Replaces GitHub for private repos and eliminates GitHub LFS bandwidth costs. Game assets (textures, models, audio) flow through LFS into SeaweedFS.

6.2 Hive — Asset Sync Service (Custom Rust Binary)

Bidirectional sync between Drive and a dedicated S3 bucket (sunbeam-game-assets). Lonni and Amber manage game assets through Drive's UI; the build pipeline and Sienna's tooling address the same assets via S3. Hive keeps both views consistent.

Architecture:

Drive REST API                        SeaweedFS S3
(Game Assets workspace)               (sunbeam-game-assets bucket)
        │                                      │
        └──────────► Hive ◄────────────────────┘
                      │
                  PostgreSQL
                  (hive_db)

Reconciliation loop (configurable, default 30s):

  1. Poll Drive API — list files in watched workspace (IDs, paths, modified timestamps)
  2. Poll S3 — ListObjectsV2 on game assets bucket (keys, ETags, LastModified)
  3. Diff both sides against Hive's state in hive_db
  4. For each difference:
    • New in Drive → download from Drive, upload to S3, record state
    • New in S3 → download from S3, upload to Drive, record state
    • Drive newer → overwrite S3, update state
    • S3 newer → overwrite Drive, update state
    • Deleted from Drive → delete from S3, remove state
    • Deleted from S3 → delete from Drive, remove state

Conflict resolution: Last-write-wins by timestamp. For three users this is sufficient. Log a warning when both sides change the same file within the same poll interval.

Path mapping: Direct 1:1. Drive workspace folder structure maps to S3 key prefixes. Game Assets/textures/hero_sprite.png in Drive becomes textures/hero_sprite.png in S3 (workspace root stripped). Lonni creates a folder in Drive, it appears as an S3 prefix. Sienna runs aws s3 cp into a prefix, it appears in Drive's folder.

State table (hive_db):

Column Type Purpose
id UUID Primary key
drive_file_id TEXT Drive's internal file ID
drive_path TEXT Human-readable path in Drive
s3_key TEXT S3 object key
drive_modified_at TIMESTAMPTZ Last modification on Drive side
s3_etag TEXT S3 object ETag
s3_last_modified TIMESTAMPTZ Last modification on S3 side
last_synced_at TIMESTAMPTZ When Hive last reconciled this file
sync_source TEXT Which side was source of truth (drive or s3)

Large file handling: Files over 50 MB stream to a temp file before uploading to the other side. Multipart upload for S3 targets. No large files held in memory.

Authentication: OIDC client credentials via Hydra (same as every other service). Registered as client hive in the OIDC registry.

Crate dependencies:

Crate Purpose
reqwest HTTP client for Drive REST API
aws-sdk-s3 S3 client for SeaweedFS
sqlx Async PostgreSQL driver
tokio Async runtime
serde / serde_json Serialization
tracing Structured logging

Configuration:

[drive]
base_url = "https://drive.sunbeam.pt"
workspace = "Game Assets"
oidc_client_id = "hive"
oidc_client_secret_file = "/run/secrets/hive-oidc"
oidc_token_url = "https://auth.sunbeam.pt/oauth2/token"

[s3]
endpoint = "http://seaweedfs-filer.storage.svc:8333"
bucket = "sunbeam-game-assets"
region = "us-east-1"
access_key_file = "/run/secrets/seaweedfs-key"
secret_key_file = "/run/secrets/seaweedfs-secret"

[postgres]
url_file = "/run/secrets/hive-db-url"

[sync]
interval_seconds = 30
temp_dir = "/tmp/hive"
large_file_threshold_mb = 50

Deployment: Single pod in lasuite namespace. No PVC needed — state lives in PostgreSQL, temp files are ephemeral. OIDC credentials and S3 keys via Kubernetes secrets.

Size estimate: ~8001200 lines of Rust. Reconciliation logic is the bulk; Drive API and S3 clients are mostly configuration of existing crates.


7. AI Integration

All AI features across the stack share a single backend.

7.1 Backend

Scaleway Generative APIs — hosted in Paris, GDPR-compliant. Fully OpenAI-compatible endpoint. Prompts and outputs are not read, reused, or analyzed by Scaleway.

7.2 Model

mistral-small-3.2-24b-instruct-2506

Property Value
Input €0.15 / M tokens
Output €0.35 / M tokens
Capabilities Chat + Vision
Strengths Summarization, rephrasing, translation, instruction following

Estimated 25M tokens/month for three users ≈ €12/month after the 1M free tier.

Upgrade path: If Conversations needs heavier reasoning, route it to qwen3-235b-a22b-instruct (€0.75/€2.25 per M tokens) while keeping Docs and Messages on Mistral Small.

7.3 Configuration

Three env vars, identical across all components:

AI_BASE_URL=https://api.scaleway.ai/v1/
AI_API_KEY=<SCW_SECRET_KEY>
AI_MODEL=mistral-small-3.2-24b-instruct-2506

7.4 Capabilities by Component

Component What AI Does
Docs Rephrase, summarize, fix typos, translate, freeform prompts on selected text
Messages Thread summaries, compose assistance, auto-labelling
Conversations Full chat interface, extensible agent tools, attachment analysis

8. DNS Map

All A records point to the Elastic Metal public IP. TLS terminated by Pingora.

Hostname Backend
docs.sunbeam.pt Docs
meet.sunbeam.pt Meet
drive.sunbeam.pt Drive
mail.sunbeam.pt Messages
chat.sunbeam.pt Conversations
people.sunbeam.pt People
src.sunbeam.pt Gitea
auth.sunbeam.pt Ory Hydra + Login UI
s3.sunbeam.pt SeaweedFS S3 endpoint (dev access)

Email DNS (sunbeam.pt zone):

Record Value
MX → Elastic Metal IP
TXT (SPF) v=spf1 ip4:<EM_IP> include:tem.scaleway.com ~all
TXT (DKIM) Generated by Postfix/Messages
TXT (DMARC) v=DMARC1; p=quarantine; rua=mailto:dmarc@sunbeam.pt
PTR Configured in Scaleway console

9. OIDC Client Registry

Each application registered in Ory Hydra:

Client Redirect URI Scopes
Docs https://docs.sunbeam.pt/oidc/callback/ openid profile email
Meet https://meet.sunbeam.pt/oidc/callback/ openid profile email
Drive https://drive.sunbeam.pt/oidc/callback/ openid profile email
Messages https://mail.sunbeam.pt/oidc/callback/ openid profile email
Conversations https://chat.sunbeam.pt/oidc/callback/ openid profile email
People https://people.sunbeam.pt/oidc/callback/ openid profile email
Gitea https://src.sunbeam.pt/user/oauth2/sunbeam/callback openid profile email
Hive Client credentials grant (no redirect URI) openid

10. Local Development Environment

10.1 Goal

The local dev stack is structurally identical to production. Same k3s orchestrator, same namespaces, same manifests, same service DNS, same Linkerd mesh, same Pingora edge proxy, same TLS termination, same OIDC flows. The only differences are resource limits, the TLS cert source (mkcert vs Let's Encrypt), and the domain suffix (sslip.io vs sunbeam.pt). Traffic flows through the same path locally as it does in production: browser → Pingora → Linkerd sidecar → app → Linkerd sidecar → data stores. Bugs caught locally are bugs that would have happened in production.

10.2 Platform

Property Value
Machine MacBook Pro M1 Pro, 10-core, 32 GB RAM
VM Lima (lightweight Linux VM, virtiofs, Apple Virtualization.framework)
Orchestration k3s inside Lima VM (--disable=traefik, identical to production)
Architecture arm64 native (no Rosetta overhead)
# Install Lima + k3s
brew install lima mkcert

# Create Lima VM with sufficient resources for the full stack
limactl start --name=sunbeam template://k3s \
  --memory=12 \
  --cpus=6 \
  --disk=60 \
  --vm-type=vz \
  --mount-type=virtiofs

# Confirm
limactl shell sunbeam kubectl get nodes

12 GB VM allocation covers the full stack (~6 GB pods + kubelet/OS overhead) and leaves 20 GB for macOS, IDE, browser, and builds.

10.3 What Stays the Same

Everything:

  • Namespace layout — all namespaces identical: ory/, lasuite/, media/, storage/, data/, devtools/, mesh/, ingress/
  • Kubernetes manifests — same Deployments, Services, ConfigMaps, Secrets. Applied with kubectl apply or Helm.
  • Service DNSseaweedfs-filer.storage.svc, kratos.ory.svc, hydra.ory.svc, etc. Apps resolve the same internal names.
  • Service mesh — Linkerd injected into all application namespaces. mTLS between all pods. Same topology as production.
  • Edge proxy — Pingora runs in ingress/ namespace, routes by hostname, terminates TLS. Same binary, same routing config (different cert source).
  • Database structure — same CloudNativePG operator, same logical databases, same schemas.
  • S3 bucket structure — same SeaweedFS filer, same bucket names.
  • OIDC flow — same Kratos + Hydra, same client registrations. Redirect URIs point at sslip.io hostnames instead of sunbeam.pt.
  • AI configuration — same AI_BASE_URL / AI_API_KEY / AI_MODEL env vars, same Scaleway endpoint.
  • Hive sync — same reconciliation loop against local Drive and SeaweedFS.
  • TURN/UDP — Pingora forwards UDP to LiveKit on the same port range (4915249252).

10.4 Local DNS — sslip.io

sslip.io provides wildcard DNS that embeds the IP address in the hostname. The Lima VM gets a routable IP on the host (e.g., 192.168.5.2), and all services resolve through it:

Production Local
docs.sunbeam.pt docs.192.168.5.2.sslip.io
meet.sunbeam.pt meet.192.168.5.2.sslip.io
drive.sunbeam.pt drive.192.168.5.2.sslip.io
mail.sunbeam.pt mail.192.168.5.2.sslip.io
chat.sunbeam.pt chat.192.168.5.2.sslip.io
people.sunbeam.pt people.192.168.5.2.sslip.io
src.sunbeam.pt src.192.168.5.2.sslip.io
auth.sunbeam.pt auth.192.168.5.2.sslip.io
s3.sunbeam.pt s3.192.168.5.2.sslip.io

Pingora hostname routing works identically — it just matches on docs.*, meet.*, etc. regardless of the domain suffix. The domain suffix is the only thing that changes between overlays.

# Get the Lima VM IP
LIMA_IP=$(limactl shell sunbeam hostname -I | awk '{print $1}')
echo "Local base domain: ${LIMA_IP}.sslip.io"

10.5 Local TLS — mkcert

Production uses rustls-acme with Let's Encrypt. Locally, Pingora loads a self-signed wildcard cert generated by mkcert, which installs a local CA trusted by the system and browsers:

brew install mkcert
mkcert -install  # Trust the local CA

LIMA_IP=$(limactl shell sunbeam hostname -I | awk '{print $1}')
mkcert "*.${LIMA_IP}.sslip.io"
# Creates: _wildcard.<IP>.sslip.io.pem + _wildcard.<IP>.sslip.io-key.pem

The certs are mounted into the Pingora pod via a Secret. The local Pingora config differs from production only in the cert source — file path to the mkcert cert instead of rustls-acme ACME negotiation. All other routing logic is identical.

10.6 What Changes (Local Overrides)

Managed via values-local.yaml overlays per component. The list is intentionally short:

Concern Production Local
Resource limits Sized for 64 GB server Capped tight (see §10.7)
TLS cert source rustls-acme + Let's Encrypt mkcert wildcard cert mounted as Secret
Domain suffix sunbeam.pt <LIMA_IP>.sslip.io
OIDC redirect URIs https://*.sunbeam.pt/... https://*.sslip.io/...
Pingora listen Bound to public IP, ports 80/443/4915249252 hostPort on Lima VM
Backups barman → Scaleway Object Storage Disabled
Email DNS MX, SPF, DKIM, DMARC, PTR Not applicable (no inbound email)

Everything else — mesh injection, mTLS, proxy routing, service discovery, OIDC flows, S3 paths, AI integration — is the same.

10.7 Resource Limits (Local)

Target: ~68 GB total for the full stack including mesh and edge, leaving 24+ GB for IDE, browser, builds.

Component Memory Limit Notes
Mesh + Edge
Linkerd control plane 128 MB destination, identity, proxy-injector combined
Linkerd proxies (sidecars) ~15 MB each ~20 injected pods ≈ 300 MB total
Pingora 64 MB Rust binary, lightweight
Data
PostgreSQL (CloudNativePG) 512 MB Handles all 10 databases fine at this scale
Redis 64 MB
OpenSearch 512 MB ES_JAVA_OPTS=-Xms256m -Xmx512m
Storage
SeaweedFS (master) 64 MB Metadata only
SeaweedFS (volume) 256 MB Actual data storage
SeaweedFS (filer) 256 MB S3 API gateway
Auth
Ory Kratos 64 MB Go binary, tiny footprint
Ory Hydra 64 MB Go binary, tiny footprint
Login UI 64 MB
Apps
Docs (Django) 256 MB
Docs (Next.js) 256 MB
Meet 128 MB
LiveKit 128 MB
Drive (Django) 256 MB
Drive (Next.js) 256 MB
Messages (Django + MDA) 256 MB
Messages (Next.js) 256 MB
Postfix MTA-in/out 64 MB each
Rspamd 128 MB
Conversations (Django) 256 MB
Conversations (Next.js) 256 MB
People (Django) 128 MB
Dev Tools
Gitea 256 MB Go binary
Hive 64 MB Rust binary, tiny
Total ~5.5 GB Including mesh overhead. Well within budget.

The Linkerd sidecar proxies add ~300 MB across all pods. Still leaves plenty of headroom on 32 GB. You don't need to run everything simultaneously — working on Hive? Skip Meet, Messages, Conversations. Testing the email flow? Skip Meet, Gitea, Hive. But you can run it all if you want to.

10.8 Access Pattern

Traffic flows through Pingora, exactly like production. Browser hits https://docs.<LIMA_IP>.sslip.io → Pingora terminates TLS → routes to Docs service → Linkerd sidecar handles mTLS to backend.

# After deploying the local stack:
LIMA_IP=$(limactl shell sunbeam hostname -I | awk '{print $1}')

echo "Docs:          https://docs.${LIMA_IP}.sslip.io"
echo "Meet:          https://meet.${LIMA_IP}.sslip.io"
echo "Drive:         https://drive.${LIMA_IP}.sslip.io"
echo "Mail:          https://mail.${LIMA_IP}.sslip.io"
echo "Chat:          https://chat.${LIMA_IP}.sslip.io"
echo "People:        https://people.${LIMA_IP}.sslip.io"
echo "Source:        https://src.${LIMA_IP}.sslip.io"
echo "Auth:          https://auth.${LIMA_IP}.sslip.io"
echo "S3:            https://s3.${LIMA_IP}.sslip.io"
echo "Linkerd:       kubectl port-forward -n mesh svc/linkerd-viz 8084:8084"

Direct kubectl port-forward is still available as a fallback for debugging individual services, but the normal workflow goes through the edge — same as production.

10.9 Manifest Organization

sunbeam-infra/              ← Gitea repo (and GitHub mirror)
├── base/                   ← Shared manifests (both environments)
│   ├── mesh/
│   ├── ingress/
│   ├── ory/
│   ├── lasuite/
│   ├── media/
│   ├── storage/
│   ├── data/
│   └── devtools/
├── overlays/
│   ├── production/         ← Production-specific values
│   │   ├── values-ory.yaml         (sunbeam.pt redirect URIs)
│   │   ├── values-pingora.yaml     (rustls-acme, LE certs)
│   │   ├── values-docs.yaml
│   │   ├── values-linkerd.yaml
│   │   └── ...
│   └── local/              ← Local dev overrides
│       ├── values-domain.yaml      (sslip.io suffix, mkcert cert path)
│       ├── values-ory.yaml         (sslip.io redirect URIs)
│       ├── values-pingora.yaml     (mkcert TLS, hostPort binding)
│       ├── values-resources.yaml   (global memory caps)
│       └── ...
├── secrets/
│   ├── production/         ← Sealed Secrets or SOPS-encrypted
│   └── local/              ← Plaintext (gitignored), includes mkcert certs
└── scripts/
    ├── local-up.sh         ← Start Lima VM, deploy full stack
    ├── local-down.sh       ← Tear down
    ├── local-certs.sh      ← Generate mkcert wildcard for current Lima IP
    └── local-urls.sh       ← Print all https://*.sslip.io URLs

Deploy to either environment:

# Local
kubectl apply -k overlays/local/

# Production
kubectl apply -k overlays/production/

Same base manifests. Same mesh. Same edge. Different certs and domain suffix. One repo.


11. Deployment Sequence (Production)

Phase 0: Local Validation (MacBook k3s)

Every phase below is first deployed and tested on the local Lima + k3s stack before touching production. The workflow:

  1. Apply manifests to local k3s using kubectl apply -k overlays/local/
  2. Verify the component starts, passes health checks, and integrates with dependencies
  3. Run the phase's integration test through the full edge path (https://*.sslip.io — same Pingora routing, same Linkerd mesh, same OIDC flows)
  4. Commit manifests to sunbeam-infra repo
  5. Apply to production using kubectl apply -k overlays/production/
  6. Verify on production

This catches misconfigurations, missing env vars, broken OIDC flows, and service connectivity issues before they hit production. The local stack is structurally identical — same namespaces, same service DNS, same manifests — so a successful local deploy is a high-confidence signal for production.

Phase 1: Foundation

  1. Provision Elastic Metal, install k3s (--disable=traefik)
  2. Deploy Linkerd service mesh
  3. Deploy CloudNativePG operator + PostgreSQL cluster
  4. Deploy Redis
  5. Deploy OpenSearch
  6. Deploy SeaweedFS (master + volume + filer)
  7. Deploy Pingora with TLS for *.sunbeam.pt

Phase 2: Authentication

  1. Deploy Ory Kratos + Hydra
  2. Deploy Sunbeam-branded login UI at auth.sunbeam.pt
  3. Create initial identities (Sienna, Lonni, Amber)
  4. Verify OIDC flow end-to-end

Phase 3: Core Apps

  1. Deploy Docs → verify Y.js WebSocket, AI slash command
  2. Deploy Meet → verify WebSocket signaling + TURN/UDP
  3. Deploy Drive → verify S3 uploads
  4. Deploy People → verify user/team management
  5. For each: create database, create S3 bucket, register OIDC client, deploy, verify

Phase 4: Communication

  1. Configure email DNS (MX, SPF, DKIM, DMARC, PTR)
  2. Deploy Messages (Postfix MTA-in/out, Rspamd, Django MDA)
  3. Provision mailboxes via People: personal + hello@ shared inbox
  4. Test send/receive with external addresses

Phase 5: AI + Dev Tools

  1. Generate Scaleway Generative APIs key
  2. Set AI_BASE_URL / AI_API_KEY / AI_MODEL across all components
  3. Deploy Conversations → verify chat, tool calls, streaming
  4. Deploy Gitea → configure OIDC, LFS → SeaweedFS S3 backend
  5. Apply Sunbeam theming to Gitea
  6. Create "Game Assets" workspace in Drive
  7. Deploy Hive → configure Drive workspace, S3 bucket, OIDC client credentials
  8. Verify bidirectional sync: upload file in Drive → appears in S3, aws s3 cp to bucket → appears in Drive

Phase 6: Hardening

  1. Configure CloudNativePG backups → Scaleway Object Storage (barman)
  2. Configure SeaweedFS replication for critical buckets
  3. Create sunbeam-studio GitHub org, create private mirror repos
  4. Add GITHUB_MIRROR_TOKEN secret to Gitea, deploy mirror workflow to all repos
  5. Verify nightly mirror: check GitHub repos reflect Gitea state
  6. Full integration smoke test: create user → log in → create doc → send email → push code → upload asset in Drive → verify in S3 → ask AI
  7. Enable Linkerd dashboard + Scaleway Cockpit for monitoring

12. Backup & Replication Strategy

12.1 Offsite Replication — Scaleway Object Storage

SeaweedFS runs on local NVMe (single node). Scaleway Object Storage in Paris serves as the offsite replication target for disaster recovery.

Scaleway Object Storage pricing (Paris):

Tier Cost Use Case
Standard Multi-AZ ~€0.015/GB/month Critical data (barman backups, active game assets)
Standard One Zone ~€0.008/GB/month Less critical replicas
Glacier ~€0.003/GB/month Deep archive (old builds, historical assets)
Egress 75 GB free/month, then €0.01/GB
Requests + Ingress Included

Estimated replication cost: 100 GB on Multi-AZ ≈ €1.50/month. Even 500 GB Multi-AZ ≈ €7.50/month. Glacier for deep archive of old builds is essentially free.

12.2 Code Backup — GitHub Mirror

All Gitea repositories are mirrored daily to private GitHub repos as an offsite code backup. This is code only — Git LFS objects are excluded (covered by SeaweedFS → Scaleway Object Storage replication above).

Implementation: Gitea Actions cron job, runs nightly at 03:00 UTC.

# .gitea/workflows/github-mirror.yaml (placed in each repo)
name: Mirror to GitHub
on:
  schedule:
    - cron: '0 3 * * *'

jobs:
  mirror:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          lfs: false
      - name: Push mirror
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_MIRROR_TOKEN }}
        run: |
          git remote add github "https://${GITHUB_TOKEN}@github.com/sunbeam-studio/${{ github.event.repository.name }}.git" 2>/dev/null || true
          git push github --all --force
          git push github --tags --force

GitHub org: sunbeam-studio (all repos private, free tier covers unlimited private repos).

Mirrored repos: sunbeam-infra, pingora-proxy, hive, game, and any future Sunbeam repositories. Not mirrored: Git LFS objects (game assets, large binaries) and secrets (never in Git).

This gives triple redundancy on source code: Gitea on Elastic Metal, GitHub mirror, and every developer's local clone. If the server and all Scaleway backups vanish simultaneously, the code is still safe.

12.3 Backup Schedule

What Method Destination Frequency Retention
PostgreSQL (all DBs) CloudNativePG barmanObjectStore Scaleway Object Storage (Multi-AZ) Continuous WAL + daily base 30 days PITR, 90 days base
SeaweedFS (all buckets) Nightly sync to Scaleway Object Storage Scaleway Object Storage (One Zone) Nightly 30 days
Git repositories (code) Gitea Actions → GitHub mirror GitHub (sunbeam-studio org, private) Nightly 03:00 UTC Indefinite
Git repositories (local) Distributed by nature (every clone) Developer machines Every push Indefinite
Git LFS objects In SeaweedFS → covered by SeaweedFS sync Scaleway Object Storage Per SeaweedFS schedule 30 days
Cluster config (manifests, Helm values) Committed to Gitea (mirrored to GitHub) Distributed + GitHub Every commit Indefinite
Ory config Committed to Gitea (secrets via Sealed Secrets or Scaleway Secret Manager) Distributed + GitHub Every commit Indefinite
Pingora config Committed to Gitea (mirrored to GitHub) Distributed + GitHub Every commit Indefinite

Monthly verification: Restore a random database to a scratch namespace, verify integrity and app startup. Spot-check a GitHub mirror repo against Gitea (compare git log --oneline -5 on both remotes). Automate via Gitea Actions cron job.


13. Operational Runbooks

13.1 Add a New User

  1. Create identity in Kratos (via People UI or Kratos admin API)
  2. People propagates permissions to La Suite apps
  3. Messages provisions personal mailbox (name@sunbeam.pt)
  4. Gitea account auto-provisions on first OIDC login
  5. User visits any *.sunbeam.pt URL, authenticates once, has access everywhere

13.2 Deploy a New La Suite Component

  1. Create logical database in CloudNativePG
  2. Create S3 bucket in SeaweedFS
  3. Register OIDC client in Hydra (ID, secret, redirect URIs)
  4. Deploy to lasuite namespace with standard env vars:
    • DJANGO_DATABASE_URL, AWS_S3_ENDPOINT_URL, AWS_S3_BUCKET_NAME
    • OIDC_RP_CLIENT_ID, OIDC_RP_CLIENT_SECRET
    • AI_BASE_URL, AI_API_KEY, AI_MODEL
  5. Add hostname route in Pingora
  6. Verify auth flow, S3 access, AI connectivity

13.3 Restore PostgreSQL from Backup

Full cluster: CloudNativePG bootstraps new cluster from barman backup in Scaleway Object Storage. Specify recoveryTarget.targetTime for PITR. Verify integrity, swap service endpoints.

Single database: pg_dump from recovered cluster → pg_restore into production.

13.4 Recover from Elastic Metal Failure

  1. Provision new Elastic Metal instance
  2. Install k3s, deploy Linkerd
  3. Restore CloudNativePG from barman (Scaleway Object Storage)
  4. Restore SeaweedFS data from Scaleway Object Storage replicas
  5. Re-deploy all manifests from Gitea (every developer has a clone)
  6. Update DNS A records to new IP
  7. Update PTR record in Scaleway console
  8. Verify OIDC, email, TURN, AI connectivity

13.5 Troubleshoot LiveKit TURN

Symptoms: Users connect to Meet but have no audio/video.

  1. Verify UDP 3478 + 4915249252 reachable from outside
  2. Check Pingora UDP forwarding is active
  3. Check LiveKit logs for TURN allocation failures
  4. Verify Elastic Metal firewall rules
  5. Test with external STUN/TURN tester

13.6 Certificate Renewal Failure

  1. Check Pingora logs for ACME errors
  2. Verify port 80 reachable for HTTP-01 challenge (or DNS-01 if configured)
  3. Restart Pingora to force rustls-acme renewal retry

14. Maintenance Schedule

Weekly

  • Check CloudNativePG backup status (latest successful timestamp)
  • Glance at Linkerd dashboard for error rate anomalies
  • Review Scaleway billing for unexpected charges

Monthly

  • Apply k3s patch releases if available
  • Check suitenumerique GitHub for new La Suite releases, review changelogs
  • Update container images one at a time, verify after each
  • Review SeaweedFS storage utilization
  • Run backup restore test (random database → scratch namespace)

Quarterly

  • La Suite upstream sync: Test new releases in local Docker Compose before deploying. One component at a time.
  • Ory updates: Kratos/Hydra migrations may involve schema changes. Always backup first.
  • Linkerd updates: Follow upgrade guide. Data plane sidecars roll automatically.
  • Security audit: Review exposed ports, DNS, TLS config. Run testssl.sh against all endpoints. Check CVEs in deployed images.
  • Storage rebalance: Evaluate SeaweedFS vs Scaleway Object Storage split. Move cold game assets to Scaleway if NVMe is filling.
  • AI model review: Check Scaleway for new models. Evaluate cost/performance. Test in Conversations before switching.

Annually

  • Review Elastic Metal spec — more RAM, more disk?
  • Evaluate new La Suite components
  • Domain renewal for sunbeam.pt
  • Full disaster recovery drill: simulate Elastic Metal loss, restore everything to a fresh instance from backups

15. Cost Estimate

Item Monthly
Scaleway Elastic Metal (64GB, NVMe) ~€80120
Scaleway Object Storage (backups + replication) ~€210
Scaleway Transactional Email ~€1
Scaleway Generative APIs ~€15
Domain (amortized) ~€2
Total ~€86138

For comparison: Google Workspace (€12/user × 3) + Zoom (€13) + Notion (€8/user × 3) + GitHub Team (€4/user × 3) + Linear (€8/user × 3) + email hosting ≈ €130+/month — with no data control, no customization, per-seat scaling.


16. Architecture Diagram (Text)

                            Internet
                               │
                    ┌──────────┴──────────┐
                    │     Pingora Edge     │
                    │  HTTPS + WS + UDP    │
                    └──────────┬──────────┘
                               │
                    ┌──────────┴──────────┐
                    │   Linkerd mTLS mesh  │
                    └──────────┬──────────┘
                               │
          ┌────────┬───────┬───┴───┬────────┬────────┐
          │        │       │       │        │        │
       ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌───┴──┐ ┌──┴──┐
       │Docs │ │Meet │ │Drive│ │Msgs │ │Convos│ │Gitea│
       └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └───┬──┘ └──┬──┘
          │       │       │       │        │       │
          │    ┌──┴──┐    │    ┌──┴──┐     │       │
          │    │Live │    │    │Post │     │       │
          │    │Kit  │    │    │fix  │     │       │
          │    └─────┘    │    └─────┘     │       │
          │               │                │       │
          │            ┌──┴──┐             │       │
          │            │Hive │ ◄── sync ──►│       │
          │            └──┬──┘             │       │
          │               │                │       │
    ┌─────┴───────────────┴────────────────┴───────┴─────┐
    │                                                     │
┌───┴────┐  ┌─────────┐  ┌───────┐  ┌──────────────────┐ │
│Postgres│  │SeaweedFS│  │ Redis │  │    OpenSearch     │ │
│ (CNPG) │  │  (S3)   │  │       │  │                  │ │
└────────┘  └─────────┘  └───────┘  └──────────────────┘ │
    │                                                     │
    │         ┌──────────────────────┐                    │
    │         │    Ory Kratos/Hydra  │◄───── all apps ────┘
    │         │    (auth.sunbeam.*)  │       via OIDC
    │         └──────────────────────┘
    │
    └──── barman ──── Scaleway Object Storage (backups)

                      Scaleway Generative APIs (AI)
                      ▲
                      │ HTTPS
                      └── Docs, Messages, Conversations

17. Open Questions

  • Game build pipeline details — Gitea Actions handles lightweight CI (compiles, tests, linting). Platform-specific builds (console SDKs, platform cert signing) offloaded to external providers. All build artifacts land in SeaweedFS. Exact pipeline TBD as game toolchain solidifies.
  • Drive REST API surface — Hive's Drive client depends on Drive's exact file list/upload/download endpoints. Need to read Drive source to confirm: pagination strategy, file version handling, multipart upload support, how folder hierarchy is represented in API responses.

Appendix: Repository References

Component Repository License
Docs github.com/suitenumerique/docs MIT
Meet github.com/suitenumerique/meet MIT
Drive github.com/suitenumerique/drive MIT
Messages github.com/suitenumerique/messages MIT
Conversations github.com/suitenumerique/conversations MIT
People github.com/suitenumerique/people MIT
Integration bar github.com/suitenumerique/integration MIT
Django shared lib github.com/suitenumerique/django-lasuite MIT
Ory Kratos github.com/ory/kratos Apache 2.0
Ory Hydra github.com/ory/hydra Apache 2.0
SeaweedFS github.com/seaweedfs/seaweedfs Apache 2.0
CloudNativePG github.com/cloudnative-pg/cloudnative-pg Apache 2.0
Linkerd github.com/linkerd/linkerd2 Apache 2.0
Pingora github.com/cloudflare/pingora Apache 2.0
Gitea github.com/go-gitea/gitea MIT
LiveKit github.com/livekit/livekit Apache 2.0