978 lines
42 KiB
Markdown
978 lines
42 KiB
Markdown
|
|
# Sunbeam Studio — Infrastructure Design Document
|
|||
|
|
|
|||
|
|
**Version:** 0.1.0-draft
|
|||
|
|
**Date:** 2026-02-28
|
|||
|
|
**Author:** Sienna Satterthwaite, Chief Engineer
|
|||
|
|
**Status:** Planning
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Overview
|
|||
|
|
|
|||
|
|
Sunbeam is a three-person game studio founded by Sienna, Lonni, and Amber. This document describes the self-hosted collaboration and development infrastructure that supports studio operations — document editing, video calls, email, version control, AI tooling, and game asset management.
|
|||
|
|
|
|||
|
|
**Guiding principles:**
|
|||
|
|
|
|||
|
|
- **One box, one bill.** Single Scaleway Elastic Metal server in Paris. No multi-vendor sprawl.
|
|||
|
|
- **European data sovereignty.** All data resides in France, GDPR-compliant by default.
|
|||
|
|
- **Self-hosted, open source.** No per-seat SaaS fees. MIT-licensed where possible.
|
|||
|
|
- **Consistent experience.** Unified authentication, shared design language, single login across all tools.
|
|||
|
|
- **Operationally honest.** The stack is architecturally rich but the operational surface is small: three users, one node, one cluster.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Platform
|
|||
|
|
|
|||
|
|
### 2.1 Compute
|
|||
|
|
|
|||
|
|
| Property | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| Provider | Scaleway Elastic Metal |
|
|||
|
|
| Region | Paris (PAR1/PAR2) |
|
|||
|
|
| RAM | 64 GB minimum |
|
|||
|
|
| Storage | Local NVMe (k3s + OS + SeaweedFS volumes) |
|
|||
|
|
| Network | Public IPv4, configurable reverse DNS |
|
|||
|
|
|
|||
|
|
### 2.2 Orchestration
|
|||
|
|
|
|||
|
|
k3s — single-node Kubernetes. Traefik disabled at install (replaced by custom Pingora proxy):
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable=traefik" sh -
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.3 External Scaleway Services
|
|||
|
|
|
|||
|
|
| Service | Purpose | Estimated Cost |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Object Storage | PostgreSQL backups (barman), cold asset overflow | ~€5–10/mo |
|
|||
|
|
| Transactional Email (TEM) | Outbound SMTP relay for notifications | ~€1/mo |
|
|||
|
|
| Generative APIs | AI inference for all La Suite components | ~€1–5/mo |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Namespace Layout
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
k3s cluster
|
|||
|
|
├── ory/ Identity & auth (Kratos, Hydra, Login UI)
|
|||
|
|
├── lasuite/ Docs, Meet, Drive, Messages, Conversations, People, Hive
|
|||
|
|
├── media/ LiveKit server + TURN
|
|||
|
|
├── storage/ SeaweedFS (master, volume, filer)
|
|||
|
|
├── data/ CloudNativePG, Redis, OpenSearch
|
|||
|
|
├── devtools/ Gitea
|
|||
|
|
├── mesh/ Linkerd control plane
|
|||
|
|
└── ingress/ Pingora edge proxy
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Core Infrastructure
|
|||
|
|
|
|||
|
|
### 4.1 Authentication — Ory Kratos + Hydra
|
|||
|
|
|
|||
|
|
Replaces the Keycloak default from La Suite's French government deployments. No JVM, no XML — lightweight Go binaries that fit k3s cleanly.
|
|||
|
|
|
|||
|
|
| Component | Role |
|
|||
|
|
|---|---|
|
|||
|
|
| **Kratos** | Identity management (registration, login, profile, recovery) |
|
|||
|
|
| **Hydra** | OAuth2 / OpenID Connect provider |
|
|||
|
|
| **Login UI** | Sunbeam-branded login and consent pages |
|
|||
|
|
|
|||
|
|
Every La Suite app authenticates via `mozilla-django-oidc`. Each app registers as an OIDC client in Hydra with a client ID, secret, and redirect URI. Swapping Keycloak for Hydra is transparent at the app level.
|
|||
|
|
|
|||
|
|
**Auth flow:**
|
|||
|
|
```
|
|||
|
|
User → any *.sunbeam.pt app
|
|||
|
|
→ 302 to auth.sunbeam.pt
|
|||
|
|
→ Hydra → Kratos login UI
|
|||
|
|
→ authenticate
|
|||
|
|
→ Hydra issues OIDC token
|
|||
|
|
→ 302 back to app
|
|||
|
|
→ app validates via mozilla-django-oidc
|
|||
|
|
→ session established
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.2 Database — CloudNativePG
|
|||
|
|
|
|||
|
|
Single PostgreSQL cluster via CloudNativePG operator. One cluster, multiple logical databases:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
PostgreSQL (CloudNativePG)
|
|||
|
|
├── kratos_db
|
|||
|
|
├── hydra_db
|
|||
|
|
├── docs_db
|
|||
|
|
├── meet_db
|
|||
|
|
├── drive_db
|
|||
|
|
├── messages_db
|
|||
|
|
├── conversations_db
|
|||
|
|
├── people_db
|
|||
|
|
├── gitea_db
|
|||
|
|
└── hive_db
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.3 Object Storage — SeaweedFS
|
|||
|
|
|
|||
|
|
S3-compatible distributed storage. Apache 2.0 licensed (chosen over MinIO post-AGPL relicensing).
|
|||
|
|
|
|||
|
|
**Components:** master (metadata/topology), volume servers (data on local NVMe), filer (S3 API gateway).
|
|||
|
|
|
|||
|
|
**S3 endpoint:** `http://seaweedfs-filer.storage.svc:8333` (cluster-internal). For local dev access outside the cluster, expose via ingress at `s3.sunbeam.pt` or `kubectl port-forward`.
|
|||
|
|
|
|||
|
|
**Buckets:**
|
|||
|
|
|
|||
|
|
| Bucket | Consumer | Contents |
|
|||
|
|
|---|---|---|
|
|||
|
|
| `sunbeam-docs` | Docs | Document content, images, exports |
|
|||
|
|
| `sunbeam-meet` | Meet | Recordings (if enabled) |
|
|||
|
|
| `sunbeam-drive` | Drive | Uploaded/shared files |
|
|||
|
|
| `sunbeam-messages` | Messages | Email attachments |
|
|||
|
|
| `sunbeam-conversations` | Conversations | Chat attachments |
|
|||
|
|
| `sunbeam-git-lfs` | Gitea | Git LFS objects (game assets) |
|
|||
|
|
| `sunbeam-game-assets` | Hive | Game assets synced between Drive and S3 |
|
|||
|
|
|
|||
|
|
### 4.4 Cache — Redis
|
|||
|
|
|
|||
|
|
Single Redis instance in `data` namespace. Shared by Messages (Celery broker), Conversations (session/cache), Meet (LiveKit ephemeral state).
|
|||
|
|
|
|||
|
|
### 4.5 Search — OpenSearch
|
|||
|
|
|
|||
|
|
Required by Messages for full-text email search. Single-node deployment in `data` namespace.
|
|||
|
|
|
|||
|
|
### 4.6 Edge Proxy — Pingora (Custom Rust Binary)
|
|||
|
|
|
|||
|
|
Custom proxy built on Cloudflare's Pingora framework. A few hundred lines of Rust handling:
|
|||
|
|
|
|||
|
|
- **HTTPS termination** — Let's Encrypt certs via `rustls-acme` compiled into the proxy binary
|
|||
|
|
- **Hostname routing** — static mapping of `*.sunbeam.pt` hostnames to backend ClusterIP:port
|
|||
|
|
- **WebSocket passthrough** — LiveKit signaling (Meet), Y.js CRDT sync (Docs)
|
|||
|
|
- **Raw UDP forwarding** — TURN relay ports (3478 + 49152–49252). Forwards bytes, not protocol. LiveKit handles TURN/STUN internally per RFC 5766. 100 relay ports is vastly more than three users need.
|
|||
|
|
|
|||
|
|
Seven hostnames, rarely changes. No dynamic service discovery required.
|
|||
|
|
|
|||
|
|
### 4.7 Service Mesh — Linkerd
|
|||
|
|
|
|||
|
|
mTLS between all pods with zero application changes. Sidecar injection provides:
|
|||
|
|
|
|||
|
|
- Mutual TLS on all internal east-west traffic
|
|||
|
|
- Automatic certificate rotation
|
|||
|
|
- Per-route observability (request rate, success rate, latency)
|
|||
|
|
|
|||
|
|
Rust-based data plane — lightweight on a single node.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. La Suite Numérique Applications
|
|||
|
|
|
|||
|
|
All La Suite apps share a common pattern: Django backend, React frontend, PostgreSQL, S3 storage, OIDC auth. Independent services, not a monolith.
|
|||
|
|
|
|||
|
|
### 5.1 Docs — `docs.sunbeam.pt`
|
|||
|
|
|
|||
|
|
Collaborative document editing. GDD, lore bibles, specs, meeting notes.
|
|||
|
|
|
|||
|
|
| Property | Detail |
|
|||
|
|
|---|---|
|
|||
|
|
| Editor | BlockNote (Tiptap-based) |
|
|||
|
|
| Realtime | Y.js CRDT over WebSocket |
|
|||
|
|
| AI | BlockNote XL AI extension — rephrase, summarize, translate, fix typos, freeform prompts. Available via formatting toolbar and `/ai` slash command. |
|
|||
|
|
| Export | .odt, .docx, .pdf |
|
|||
|
|
|
|||
|
|
BlockNote XL packages (AI, PDF export) are GPL-licensed. Fine for internal use — GPL triggers on distribution, not deployment.
|
|||
|
|
|
|||
|
|
### 5.2 Meet — `meet.sunbeam.pt`
|
|||
|
|
|
|||
|
|
Video conferencing. Standups, playtests, partner calls.
|
|||
|
|
|
|||
|
|
| Property | Detail |
|
|||
|
|
|---|---|
|
|||
|
|
| Backend | LiveKit (self-hosted, Apache 2.0) |
|
|||
|
|
| Media | DTLS-SRTP encrypted WebRTC |
|
|||
|
|
| TURN | LiveKit built-in, UDP ports exposed through Pingora |
|
|||
|
|
|
|||
|
|
### 5.3 Drive — `drive.sunbeam.pt`
|
|||
|
|
|
|||
|
|
File sharing and document management. Game assets, reference material, shared resources.
|
|||
|
|
|
|||
|
|
Granular access control, workspace organization, linked to Messages for email attachments and Docs for file references.
|
|||
|
|
|
|||
|
|
### 5.4 Messages — `mail.sunbeam.pt`
|
|||
|
|
|
|||
|
|
Full email platform with team and personal mailboxes.
|
|||
|
|
|
|||
|
|
**Architecture:**
|
|||
|
|
```
|
|||
|
|
Inbound: Internet → MX → Pingora → Postfix MTA-in → Rspamd → Django MDA → Postgres + OpenSearch
|
|||
|
|
Outbound: User → Django → Postfix MTA-out (DKIM) → Scaleway TEM relay → recipient
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Mailboxes:**
|
|||
|
|
- Personal: `sienna@`, `lonni@`, `amber@sunbeam.pt`
|
|||
|
|
- Shared: `hello@sunbeam.pt` (all three see incoming business email)
|
|||
|
|
|
|||
|
|
**AI features:** Thread summaries, compose assistance, auto-labelling.
|
|||
|
|
|
|||
|
|
**Limitation:** No IMAP/POP3 — web UI only. Deliberate upstream design choice. Acceptable for a three-person studio living in the browser.
|
|||
|
|
|
|||
|
|
**DNS requirements:** MX, SPF, DKIM, DMARC, PTR (reverse DNS configurable in Scaleway console).
|
|||
|
|
|
|||
|
|
### 5.5 Conversations — `chat.sunbeam.pt`
|
|||
|
|
|
|||
|
|
AI chatbot / team assistant.
|
|||
|
|
|
|||
|
|
| Property | Detail |
|
|||
|
|
|---|---|
|
|||
|
|
| AI Framework | Pydantic AI (backend), Vercel AI SDK (frontend streaming) |
|
|||
|
|
| Tools | Extensible agent tools — wire into Docs search, Drive queries, Messages summaries |
|
|||
|
|
| Attachments | PDF and image upload for analysis |
|
|||
|
|
| Helm | Official chart at `suitenumerique.github.io/conversations/` |
|
|||
|
|
|
|||
|
|
Primary force multiplier. Custom tools can search GDD content, query shared files, and summarize email threads.
|
|||
|
|
|
|||
|
|
### 5.6 People — `people.sunbeam.pt`
|
|||
|
|
|
|||
|
|
Centralized user and team management. Creates users/teams and propagates permissions across all La Suite apps. Interoperates with dimail (Messages email backend) for mailbox provisioning.
|
|||
|
|
|
|||
|
|
Admin-facing, not a daily-use interface.
|
|||
|
|
|
|||
|
|
### 5.7 La Suite Integration Layer
|
|||
|
|
|
|||
|
|
Apps share a unified experience through:
|
|||
|
|
|
|||
|
|
- **`@gouvfr-lasuite/integration`** — npm package providing the shared navigation bar, header, branding. Fork/configure for Sunbeam logo, colors, and nav links.
|
|||
|
|
- **`lasuite-django`** — shared Python library for OIDC helpers and common Django patterns.
|
|||
|
|
- Per-app env vars for branding: `DJANGO_EMAIL_BRAND_NAME=Sunbeam`, `DJANGO_EMAIL_LOGO_IMG`, etc.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Development Tools
|
|||
|
|
|
|||
|
|
### 6.1 Gitea — `src.sunbeam.pt`
|
|||
|
|
|
|||
|
|
Self-hosted Git with issue tracking, wiki, and CI.
|
|||
|
|
|
|||
|
|
| Property | Detail |
|
|||
|
|
|---|---|
|
|||
|
|
| Runtime | Single Go binary |
|
|||
|
|
| Auth | OIDC via Hydra (same login as everything else) |
|
|||
|
|
| LFS | Built-in Git LFS, S3 backend → SeaweedFS `sunbeam-git-lfs` bucket |
|
|||
|
|
| CI | Gitea Actions (GitHub Actions compatible YAML). Lightweight jobs: compiles, tests, linting. Platform-specific builds offloaded to external providers. |
|
|||
|
|
| Theming | `custom/` directory for Sunbeam logo, colors, CSS |
|
|||
|
|
|
|||
|
|
Replaces GitHub for private repos and eliminates GitHub LFS bandwidth costs. Game assets (textures, models, audio) flow through LFS into SeaweedFS.
|
|||
|
|
|
|||
|
|
### 6.2 Hive — Asset Sync Service (Custom Rust Binary)
|
|||
|
|
|
|||
|
|
Bidirectional sync between Drive and a dedicated S3 bucket (`sunbeam-game-assets`). Lonni and Amber manage game assets through Drive's UI; the build pipeline and Sienna's tooling address the same assets via S3. Hive keeps both views consistent.
|
|||
|
|
|
|||
|
|
**Architecture:**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Drive REST API SeaweedFS S3
|
|||
|
|
(Game Assets workspace) (sunbeam-game-assets bucket)
|
|||
|
|
│ │
|
|||
|
|
└──────────► Hive ◄────────────────────┘
|
|||
|
|
│
|
|||
|
|
PostgreSQL
|
|||
|
|
(hive_db)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Reconciliation loop** (configurable, default 30s):
|
|||
|
|
|
|||
|
|
1. Poll Drive API — list files in watched workspace (IDs, paths, modified timestamps)
|
|||
|
|
2. Poll S3 — `ListObjectsV2` on game assets bucket (keys, ETags, LastModified)
|
|||
|
|
3. Diff both sides against Hive's state in `hive_db`
|
|||
|
|
4. For each difference:
|
|||
|
|
- New in Drive → download from Drive, upload to S3, record state
|
|||
|
|
- New in S3 → download from S3, upload to Drive, record state
|
|||
|
|
- Drive newer → overwrite S3, update state
|
|||
|
|
- S3 newer → overwrite Drive, update state
|
|||
|
|
- Deleted from Drive → delete from S3, remove state
|
|||
|
|
- Deleted from S3 → delete from Drive, remove state
|
|||
|
|
|
|||
|
|
**Conflict resolution:** Last-write-wins by timestamp. For three users this is sufficient. Log a warning when both sides change the same file within the same poll interval.
|
|||
|
|
|
|||
|
|
**Path mapping:** Direct 1:1. Drive workspace folder structure maps to S3 key prefixes. `Game Assets/textures/hero_sprite.png` in Drive becomes `textures/hero_sprite.png` in S3 (workspace root stripped). Lonni creates a folder in Drive, it appears as an S3 prefix. Sienna runs `aws s3 cp` into a prefix, it appears in Drive's folder.
|
|||
|
|
|
|||
|
|
**State table (`hive_db`):**
|
|||
|
|
|
|||
|
|
| Column | Type | Purpose |
|
|||
|
|
|---|---|---|
|
|||
|
|
| `id` | UUID | Primary key |
|
|||
|
|
| `drive_file_id` | TEXT | Drive's internal file ID |
|
|||
|
|
| `drive_path` | TEXT | Human-readable path in Drive |
|
|||
|
|
| `s3_key` | TEXT | S3 object key |
|
|||
|
|
| `drive_modified_at` | TIMESTAMPTZ | Last modification on Drive side |
|
|||
|
|
| `s3_etag` | TEXT | S3 object ETag |
|
|||
|
|
| `s3_last_modified` | TIMESTAMPTZ | Last modification on S3 side |
|
|||
|
|
| `last_synced_at` | TIMESTAMPTZ | When Hive last reconciled this file |
|
|||
|
|
| `sync_source` | TEXT | Which side was source of truth (`drive` or `s3`) |
|
|||
|
|
|
|||
|
|
**Large file handling:** Files over 50 MB stream to a temp file before uploading to the other side. Multipart upload for S3 targets. No large files held in memory.
|
|||
|
|
|
|||
|
|
**Authentication:** OIDC client credentials via Hydra (same as every other service). Registered as client `hive` in the OIDC registry.
|
|||
|
|
|
|||
|
|
**Crate dependencies:**
|
|||
|
|
|
|||
|
|
| Crate | Purpose |
|
|||
|
|
|---|---|
|
|||
|
|
| `reqwest` | HTTP client for Drive REST API |
|
|||
|
|
| `aws-sdk-s3` | S3 client for SeaweedFS |
|
|||
|
|
| `sqlx` | Async PostgreSQL driver |
|
|||
|
|
| `tokio` | Async runtime |
|
|||
|
|
| `serde` / `serde_json` | Serialization |
|
|||
|
|
| `tracing` | Structured logging |
|
|||
|
|
|
|||
|
|
**Configuration:**
|
|||
|
|
|
|||
|
|
```toml
|
|||
|
|
[drive]
|
|||
|
|
base_url = "https://drive.sunbeam.pt"
|
|||
|
|
workspace = "Game Assets"
|
|||
|
|
oidc_client_id = "hive"
|
|||
|
|
oidc_client_secret_file = "/run/secrets/hive-oidc"
|
|||
|
|
oidc_token_url = "https://auth.sunbeam.pt/oauth2/token"
|
|||
|
|
|
|||
|
|
[s3]
|
|||
|
|
endpoint = "http://seaweedfs-filer.storage.svc:8333"
|
|||
|
|
bucket = "sunbeam-game-assets"
|
|||
|
|
region = "us-east-1"
|
|||
|
|
access_key_file = "/run/secrets/seaweedfs-key"
|
|||
|
|
secret_key_file = "/run/secrets/seaweedfs-secret"
|
|||
|
|
|
|||
|
|
[postgres]
|
|||
|
|
url_file = "/run/secrets/hive-db-url"
|
|||
|
|
|
|||
|
|
[sync]
|
|||
|
|
interval_seconds = 30
|
|||
|
|
temp_dir = "/tmp/hive"
|
|||
|
|
large_file_threshold_mb = 50
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Deployment:** Single pod in `lasuite` namespace. No PVC needed — state lives in PostgreSQL, temp files are ephemeral. OIDC credentials and S3 keys via Kubernetes secrets.
|
|||
|
|
|
|||
|
|
**Size estimate:** ~800–1200 lines of Rust. Reconciliation logic is the bulk; Drive API and S3 clients are mostly configuration of existing crates.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. AI Integration
|
|||
|
|
|
|||
|
|
All AI features across the stack share a single backend.
|
|||
|
|
|
|||
|
|
### 7.1 Backend
|
|||
|
|
|
|||
|
|
**Scaleway Generative APIs** — hosted in Paris, GDPR-compliant. Fully OpenAI-compatible endpoint. Prompts and outputs are not read, reused, or analyzed by Scaleway.
|
|||
|
|
|
|||
|
|
### 7.2 Model
|
|||
|
|
|
|||
|
|
**`mistral-small-3.2-24b-instruct-2506`**
|
|||
|
|
|
|||
|
|
| Property | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| Input | €0.15 / M tokens |
|
|||
|
|
| Output | €0.35 / M tokens |
|
|||
|
|
| Capabilities | Chat + Vision |
|
|||
|
|
| Strengths | Summarization, rephrasing, translation, instruction following |
|
|||
|
|
|
|||
|
|
Estimated 2–5M tokens/month for three users ≈ €1–2/month after the 1M free tier.
|
|||
|
|
|
|||
|
|
**Upgrade path:** If Conversations needs heavier reasoning, route it to `qwen3-235b-a22b-instruct` (€0.75/€2.25 per M tokens) while keeping Docs and Messages on Mistral Small.
|
|||
|
|
|
|||
|
|
### 7.3 Configuration
|
|||
|
|
|
|||
|
|
Three env vars, identical across all components:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
AI_BASE_URL=https://api.scaleway.ai/v1/
|
|||
|
|
AI_API_KEY=<SCW_SECRET_KEY>
|
|||
|
|
AI_MODEL=mistral-small-3.2-24b-instruct-2506
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 7.4 Capabilities by Component
|
|||
|
|
|
|||
|
|
| Component | What AI Does |
|
|||
|
|
|---|---|
|
|||
|
|
| Docs | Rephrase, summarize, fix typos, translate, freeform prompts on selected text |
|
|||
|
|
| Messages | Thread summaries, compose assistance, auto-labelling |
|
|||
|
|
| Conversations | Full chat interface, extensible agent tools, attachment analysis |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. DNS Map
|
|||
|
|
|
|||
|
|
All A records point to the Elastic Metal public IP. TLS terminated by Pingora.
|
|||
|
|
|
|||
|
|
| Hostname | Backend |
|
|||
|
|
|---|---|
|
|||
|
|
| `docs.sunbeam.pt` | Docs |
|
|||
|
|
| `meet.sunbeam.pt` | Meet |
|
|||
|
|
| `drive.sunbeam.pt` | Drive |
|
|||
|
|
| `mail.sunbeam.pt` | Messages |
|
|||
|
|
| `chat.sunbeam.pt` | Conversations |
|
|||
|
|
| `people.sunbeam.pt` | People |
|
|||
|
|
| `src.sunbeam.pt` | Gitea |
|
|||
|
|
| `auth.sunbeam.pt` | Ory Hydra + Login UI |
|
|||
|
|
| `s3.sunbeam.pt` | SeaweedFS S3 endpoint (dev access) |
|
|||
|
|
|
|||
|
|
**Email DNS (sunbeam.pt zone):**
|
|||
|
|
|
|||
|
|
| Record | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| MX | → Elastic Metal IP |
|
|||
|
|
| TXT (SPF) | `v=spf1 ip4:<EM_IP> include:tem.scaleway.com ~all` |
|
|||
|
|
| TXT (DKIM) | Generated by Postfix/Messages |
|
|||
|
|
| TXT (DMARC) | `v=DMARC1; p=quarantine; rua=mailto:dmarc@sunbeam.pt` |
|
|||
|
|
| PTR | Configured in Scaleway console |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. OIDC Client Registry
|
|||
|
|
|
|||
|
|
Each application registered in Ory Hydra:
|
|||
|
|
|
|||
|
|
| Client | Redirect URI | Scopes |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Docs | `https://docs.sunbeam.pt/oidc/callback/` | `openid profile email` |
|
|||
|
|
| Meet | `https://meet.sunbeam.pt/oidc/callback/` | `openid profile email` |
|
|||
|
|
| Drive | `https://drive.sunbeam.pt/oidc/callback/` | `openid profile email` |
|
|||
|
|
| Messages | `https://mail.sunbeam.pt/oidc/callback/` | `openid profile email` |
|
|||
|
|
| Conversations | `https://chat.sunbeam.pt/oidc/callback/` | `openid profile email` |
|
|||
|
|
| People | `https://people.sunbeam.pt/oidc/callback/` | `openid profile email` |
|
|||
|
|
| Gitea | `https://src.sunbeam.pt/user/oauth2/sunbeam/callback` | `openid profile email` |
|
|||
|
|
| Hive | Client credentials grant (no redirect URI) | `openid` |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Local Development Environment
|
|||
|
|
|
|||
|
|
### 10.1 Goal
|
|||
|
|
|
|||
|
|
The local dev stack is **structurally identical** to production. Same k3s orchestrator, same namespaces, same manifests, same service DNS, same Linkerd mesh, same Pingora edge proxy, same TLS termination, same OIDC flows. The only differences are resource limits, the TLS cert source (mkcert vs Let's Encrypt), and the domain suffix (sslip.io vs sunbeam.pt). Traffic flows through the same path locally as it does in production: browser → Pingora → Linkerd sidecar → app → Linkerd sidecar → data stores. Bugs caught locally are bugs that would have happened in production.
|
|||
|
|
|
|||
|
|
### 10.2 Platform
|
|||
|
|
|
|||
|
|
| Property | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| Machine | MacBook Pro M1 Pro, 10-core, 32 GB RAM |
|
|||
|
|
| VM | Lima (lightweight Linux VM, virtiofs, Apple Virtualization.framework) |
|
|||
|
|
| Orchestration | k3s inside Lima VM (`--disable=traefik`, identical to production) |
|
|||
|
|
| Architecture | arm64 native (no Rosetta overhead) |
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Install Lima + k3s
|
|||
|
|
brew install lima mkcert
|
|||
|
|
|
|||
|
|
# Create Lima VM with sufficient resources for the full stack
|
|||
|
|
limactl start --name=sunbeam template://k3s \
|
|||
|
|
--memory=12 \
|
|||
|
|
--cpus=6 \
|
|||
|
|
--disk=60 \
|
|||
|
|
--vm-type=vz \
|
|||
|
|
--mount-type=virtiofs
|
|||
|
|
|
|||
|
|
# Confirm
|
|||
|
|
limactl shell sunbeam kubectl get nodes
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
12 GB VM allocation covers the full stack (~6 GB pods + kubelet/OS overhead) and leaves 20 GB for macOS, IDE, browser, and builds.
|
|||
|
|
|
|||
|
|
### 10.3 What Stays the Same
|
|||
|
|
|
|||
|
|
Everything:
|
|||
|
|
|
|||
|
|
- **Namespace layout** — all namespaces identical: `ory/`, `lasuite/`, `media/`, `storage/`, `data/`, `devtools/`, `mesh/`, `ingress/`
|
|||
|
|
- **Kubernetes manifests** — same Deployments, Services, ConfigMaps, Secrets. Applied with `kubectl apply` or Helm.
|
|||
|
|
- **Service DNS** — `seaweedfs-filer.storage.svc`, `kratos.ory.svc`, `hydra.ory.svc`, etc. Apps resolve the same internal names.
|
|||
|
|
- **Service mesh** — Linkerd injected into all application namespaces. mTLS between all pods. Same topology as production.
|
|||
|
|
- **Edge proxy** — Pingora runs in `ingress/` namespace, routes by hostname, terminates TLS. Same binary, same routing config (different cert source).
|
|||
|
|
- **Database structure** — same CloudNativePG operator, same logical databases, same schemas.
|
|||
|
|
- **S3 bucket structure** — same SeaweedFS filer, same bucket names.
|
|||
|
|
- **OIDC flow** — same Kratos + Hydra, same client registrations. Redirect URIs point at sslip.io hostnames instead of `sunbeam.pt`.
|
|||
|
|
- **AI configuration** — same `AI_BASE_URL` / `AI_API_KEY` / `AI_MODEL` env vars, same Scaleway endpoint.
|
|||
|
|
- **Hive sync** — same reconciliation loop against local Drive and SeaweedFS.
|
|||
|
|
- **TURN/UDP** — Pingora forwards UDP to LiveKit on the same port range (49152–49252).
|
|||
|
|
|
|||
|
|
### 10.4 Local DNS — sslip.io
|
|||
|
|
|
|||
|
|
[sslip.io](https://sslip.io) provides wildcard DNS that embeds the IP address in the hostname. The Lima VM gets a routable IP on the host (e.g., `192.168.5.2`), and all services resolve through it:
|
|||
|
|
|
|||
|
|
| Production | Local |
|
|||
|
|
|---|---|
|
|||
|
|
| `docs.sunbeam.pt` | `docs.192.168.5.2.sslip.io` |
|
|||
|
|
| `meet.sunbeam.pt` | `meet.192.168.5.2.sslip.io` |
|
|||
|
|
| `drive.sunbeam.pt` | `drive.192.168.5.2.sslip.io` |
|
|||
|
|
| `mail.sunbeam.pt` | `mail.192.168.5.2.sslip.io` |
|
|||
|
|
| `chat.sunbeam.pt` | `chat.192.168.5.2.sslip.io` |
|
|||
|
|
| `people.sunbeam.pt` | `people.192.168.5.2.sslip.io` |
|
|||
|
|
| `src.sunbeam.pt` | `src.192.168.5.2.sslip.io` |
|
|||
|
|
| `auth.sunbeam.pt` | `auth.192.168.5.2.sslip.io` |
|
|||
|
|
| `s3.sunbeam.pt` | `s3.192.168.5.2.sslip.io` |
|
|||
|
|
|
|||
|
|
Pingora hostname routing works identically — it just matches on `docs.*`, `meet.*`, etc. regardless of the domain suffix. The domain suffix is the only thing that changes between overlays.
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Get the Lima VM IP
|
|||
|
|
LIMA_IP=$(limactl shell sunbeam hostname -I | awk '{print $1}')
|
|||
|
|
echo "Local base domain: ${LIMA_IP}.sslip.io"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 10.5 Local TLS — mkcert
|
|||
|
|
|
|||
|
|
Production uses `rustls-acme` with Let's Encrypt. Locally, Pingora loads a self-signed wildcard cert generated by [mkcert](https://github.com/FiloSottile/mkcert), which installs a local CA trusted by the system and browsers:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
brew install mkcert
|
|||
|
|
mkcert -install # Trust the local CA
|
|||
|
|
|
|||
|
|
LIMA_IP=$(limactl shell sunbeam hostname -I | awk '{print $1}')
|
|||
|
|
mkcert "*.${LIMA_IP}.sslip.io"
|
|||
|
|
# Creates: _wildcard.<IP>.sslip.io.pem + _wildcard.<IP>.sslip.io-key.pem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The certs are mounted into the Pingora pod via a Secret. The local Pingora config differs from production only in the cert source — file path to the mkcert cert instead of `rustls-acme` ACME negotiation. All other routing logic is identical.
|
|||
|
|
|
|||
|
|
### 10.6 What Changes (Local Overrides)
|
|||
|
|
|
|||
|
|
Managed via `values-local.yaml` overlays per component. The list is intentionally short:
|
|||
|
|
|
|||
|
|
| Concern | Production | Local |
|
|||
|
|
|---|---|---|
|
|||
|
|
| **Resource limits** | Sized for 64 GB server | Capped tight (see §10.7) |
|
|||
|
|
| **TLS cert source** | `rustls-acme` + Let's Encrypt | mkcert wildcard cert mounted as Secret |
|
|||
|
|
| **Domain suffix** | `sunbeam.pt` | `<LIMA_IP>.sslip.io` |
|
|||
|
|
| **OIDC redirect URIs** | `https://*.sunbeam.pt/...` | `https://*.sslip.io/...` |
|
|||
|
|
| **Pingora listen** | Bound to public IP, ports 80/443/49152–49252 | hostPort on Lima VM |
|
|||
|
|
| **Backups** | barman → Scaleway Object Storage | Disabled |
|
|||
|
|
| **Email DNS** | MX, SPF, DKIM, DMARC, PTR | Not applicable (no inbound email) |
|
|||
|
|
|
|||
|
|
Everything else — mesh injection, mTLS, proxy routing, service discovery, OIDC flows, S3 paths, AI integration — is the same.
|
|||
|
|
|
|||
|
|
### 10.7 Resource Limits (Local)
|
|||
|
|
|
|||
|
|
Target: **~6–8 GB total** for the full stack including mesh and edge, leaving 24+ GB for IDE, browser, builds.
|
|||
|
|
|
|||
|
|
| Component | Memory Limit | Notes |
|
|||
|
|
|---|---|---|
|
|||
|
|
| **Mesh + Edge** | | |
|
|||
|
|
| Linkerd control plane | 128 MB | destination, identity, proxy-injector combined |
|
|||
|
|
| Linkerd proxies (sidecars) | ~15 MB each | ~20 injected pods ≈ 300 MB total |
|
|||
|
|
| Pingora | 64 MB | Rust binary, lightweight |
|
|||
|
|
| **Data** | | |
|
|||
|
|
| PostgreSQL (CloudNativePG) | 512 MB | Handles all 10 databases fine at this scale |
|
|||
|
|
| Redis | 64 MB | |
|
|||
|
|
| OpenSearch | 512 MB | `ES_JAVA_OPTS=-Xms256m -Xmx512m` |
|
|||
|
|
| **Storage** | | |
|
|||
|
|
| SeaweedFS (master) | 64 MB | Metadata only |
|
|||
|
|
| SeaweedFS (volume) | 256 MB | Actual data storage |
|
|||
|
|
| SeaweedFS (filer) | 256 MB | S3 API gateway |
|
|||
|
|
| **Auth** | | |
|
|||
|
|
| Ory Kratos | 64 MB | Go binary, tiny footprint |
|
|||
|
|
| Ory Hydra | 64 MB | Go binary, tiny footprint |
|
|||
|
|
| Login UI | 64 MB | |
|
|||
|
|
| **Apps** | | |
|
|||
|
|
| Docs (Django) | 256 MB | |
|
|||
|
|
| Docs (Next.js) | 256 MB | |
|
|||
|
|
| Meet | 128 MB | |
|
|||
|
|
| LiveKit | 128 MB | |
|
|||
|
|
| Drive (Django) | 256 MB | |
|
|||
|
|
| Drive (Next.js) | 256 MB | |
|
|||
|
|
| Messages (Django + MDA) | 256 MB | |
|
|||
|
|
| Messages (Next.js) | 256 MB | |
|
|||
|
|
| Postfix MTA-in/out | 64 MB each | |
|
|||
|
|
| Rspamd | 128 MB | |
|
|||
|
|
| Conversations (Django) | 256 MB | |
|
|||
|
|
| Conversations (Next.js) | 256 MB | |
|
|||
|
|
| People (Django) | 128 MB | |
|
|||
|
|
| **Dev Tools** | | |
|
|||
|
|
| Gitea | 256 MB | Go binary |
|
|||
|
|
| Hive | 64 MB | Rust binary, tiny |
|
|||
|
|
| **Total** | **~5.5 GB** | Including mesh overhead. Well within budget. |
|
|||
|
|
|
|||
|
|
The Linkerd sidecar proxies add ~300 MB across all pods. Still leaves plenty of headroom on 32 GB. You don't need to run everything simultaneously — working on Hive? Skip Meet, Messages, Conversations. Testing the email flow? Skip Meet, Gitea, Hive. But you *can* run it all if you want to.
|
|||
|
|
|
|||
|
|
### 10.8 Access Pattern
|
|||
|
|
|
|||
|
|
Traffic flows through Pingora, exactly like production. Browser hits `https://docs.<LIMA_IP>.sslip.io` → Pingora terminates TLS → routes to Docs service → Linkerd sidecar handles mTLS to backend.
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# After deploying the local stack:
|
|||
|
|
LIMA_IP=$(limactl shell sunbeam hostname -I | awk '{print $1}')
|
|||
|
|
|
|||
|
|
echo "Docs: https://docs.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Meet: https://meet.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Drive: https://drive.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Mail: https://mail.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Chat: https://chat.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "People: https://people.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Source: https://src.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Auth: https://auth.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "S3: https://s3.${LIMA_IP}.sslip.io"
|
|||
|
|
echo "Linkerd: kubectl port-forward -n mesh svc/linkerd-viz 8084:8084"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Direct `kubectl port-forward` is still available as a fallback for debugging individual services, but the normal workflow goes through the edge — same as production.
|
|||
|
|
|
|||
|
|
### 10.9 Manifest Organization
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
sunbeam-infra/ ← Gitea repo (and GitHub mirror)
|
|||
|
|
├── base/ ← Shared manifests (both environments)
|
|||
|
|
│ ├── mesh/
|
|||
|
|
│ ├── ingress/
|
|||
|
|
│ ├── ory/
|
|||
|
|
│ ├── lasuite/
|
|||
|
|
│ ├── media/
|
|||
|
|
│ ├── storage/
|
|||
|
|
│ ├── data/
|
|||
|
|
│ └── devtools/
|
|||
|
|
├── overlays/
|
|||
|
|
│ ├── production/ ← Production-specific values
|
|||
|
|
│ │ ├── values-ory.yaml (sunbeam.pt redirect URIs)
|
|||
|
|
│ │ ├── values-pingora.yaml (rustls-acme, LE certs)
|
|||
|
|
│ │ ├── values-docs.yaml
|
|||
|
|
│ │ ├── values-linkerd.yaml
|
|||
|
|
│ │ └── ...
|
|||
|
|
│ └── local/ ← Local dev overrides
|
|||
|
|
│ ├── values-domain.yaml (sslip.io suffix, mkcert cert path)
|
|||
|
|
│ ├── values-ory.yaml (sslip.io redirect URIs)
|
|||
|
|
│ ├── values-pingora.yaml (mkcert TLS, hostPort binding)
|
|||
|
|
│ ├── values-resources.yaml (global memory caps)
|
|||
|
|
│ └── ...
|
|||
|
|
├── secrets/
|
|||
|
|
│ ├── production/ ← Sealed Secrets or SOPS-encrypted
|
|||
|
|
│ └── local/ ← Plaintext (gitignored), includes mkcert certs
|
|||
|
|
└── scripts/
|
|||
|
|
├── local-up.sh ← Start Lima VM, deploy full stack
|
|||
|
|
├── local-down.sh ← Tear down
|
|||
|
|
├── local-certs.sh ← Generate mkcert wildcard for current Lima IP
|
|||
|
|
└── local-urls.sh ← Print all https://*.sslip.io URLs
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Deploy to either environment:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Local
|
|||
|
|
kubectl apply -k overlays/local/
|
|||
|
|
|
|||
|
|
# Production
|
|||
|
|
kubectl apply -k overlays/production/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Same base manifests. Same mesh. Same edge. Different certs and domain suffix. One repo.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Deployment Sequence (Production)
|
|||
|
|
|
|||
|
|
### Phase 0: Local Validation (MacBook k3s)
|
|||
|
|
|
|||
|
|
Every phase below is first deployed and tested on the local Lima + k3s stack before touching production. The workflow:
|
|||
|
|
|
|||
|
|
1. Apply manifests to local k3s using `kubectl apply -k overlays/local/`
|
|||
|
|
2. Verify the component starts, passes health checks, and integrates with dependencies
|
|||
|
|
3. Run the phase's integration test through the full edge path (`https://*.sslip.io` — same Pingora routing, same Linkerd mesh, same OIDC flows)
|
|||
|
|
4. Commit manifests to `sunbeam-infra` repo
|
|||
|
|
5. Apply to production using `kubectl apply -k overlays/production/`
|
|||
|
|
6. Verify on production
|
|||
|
|
|
|||
|
|
This catches misconfigurations, missing env vars, broken OIDC flows, and service connectivity issues before they hit production. The local stack is structurally identical — same namespaces, same service DNS, same manifests — so a successful local deploy is a high-confidence signal for production.
|
|||
|
|
|
|||
|
|
### Phase 1: Foundation
|
|||
|
|
1. Provision Elastic Metal, install k3s (`--disable=traefik`)
|
|||
|
|
2. Deploy Linkerd service mesh
|
|||
|
|
3. Deploy CloudNativePG operator + PostgreSQL cluster
|
|||
|
|
4. Deploy Redis
|
|||
|
|
5. Deploy OpenSearch
|
|||
|
|
6. Deploy SeaweedFS (master + volume + filer)
|
|||
|
|
7. Deploy Pingora with TLS for `*.sunbeam.pt`
|
|||
|
|
|
|||
|
|
### Phase 2: Authentication
|
|||
|
|
8. Deploy Ory Kratos + Hydra
|
|||
|
|
9. Deploy Sunbeam-branded login UI at `auth.sunbeam.pt`
|
|||
|
|
10. Create initial identities (Sienna, Lonni, Amber)
|
|||
|
|
11. Verify OIDC flow end-to-end
|
|||
|
|
|
|||
|
|
### Phase 3: Core Apps
|
|||
|
|
12. Deploy Docs → verify Y.js WebSocket, AI slash command
|
|||
|
|
13. Deploy Meet → verify WebSocket signaling + TURN/UDP
|
|||
|
|
14. Deploy Drive → verify S3 uploads
|
|||
|
|
15. Deploy People → verify user/team management
|
|||
|
|
16. For each: create database, create S3 bucket, register OIDC client, deploy, verify
|
|||
|
|
|
|||
|
|
### Phase 4: Communication
|
|||
|
|
17. Configure email DNS (MX, SPF, DKIM, DMARC, PTR)
|
|||
|
|
18. Deploy Messages (Postfix MTA-in/out, Rspamd, Django MDA)
|
|||
|
|
19. Provision mailboxes via People: personal + `hello@` shared inbox
|
|||
|
|
20. Test send/receive with external addresses
|
|||
|
|
|
|||
|
|
### Phase 5: AI + Dev Tools
|
|||
|
|
21. Generate Scaleway Generative APIs key
|
|||
|
|
22. Set `AI_BASE_URL` / `AI_API_KEY` / `AI_MODEL` across all components
|
|||
|
|
23. Deploy Conversations → verify chat, tool calls, streaming
|
|||
|
|
24. Deploy Gitea → configure OIDC, LFS → SeaweedFS S3 backend
|
|||
|
|
25. Apply Sunbeam theming to Gitea
|
|||
|
|
26. Create "Game Assets" workspace in Drive
|
|||
|
|
27. Deploy Hive → configure Drive workspace, S3 bucket, OIDC client credentials
|
|||
|
|
28. Verify bidirectional sync: upload file in Drive → appears in S3, `aws s3 cp` to bucket → appears in Drive
|
|||
|
|
|
|||
|
|
### Phase 6: Hardening
|
|||
|
|
29. Configure CloudNativePG backups → Scaleway Object Storage (barman)
|
|||
|
|
30. Configure SeaweedFS replication for critical buckets
|
|||
|
|
31. Create `sunbeam-studio` GitHub org, create private mirror repos
|
|||
|
|
32. Add `GITHUB_MIRROR_TOKEN` secret to Gitea, deploy mirror workflow to all repos
|
|||
|
|
33. Verify nightly mirror: check GitHub repos reflect Gitea state
|
|||
|
|
34. Full integration smoke test: create user → log in → create doc → send email → push code → upload asset in Drive → verify in S3 → ask AI
|
|||
|
|
35. Enable Linkerd dashboard + Scaleway Cockpit for monitoring
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 12. Backup & Replication Strategy
|
|||
|
|
|
|||
|
|
### 12.1 Offsite Replication — Scaleway Object Storage
|
|||
|
|
|
|||
|
|
SeaweedFS runs on local NVMe (single node). Scaleway Object Storage in Paris serves as the offsite replication target for disaster recovery.
|
|||
|
|
|
|||
|
|
**Scaleway Object Storage pricing (Paris):**
|
|||
|
|
|
|||
|
|
| Tier | Cost | Use Case |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Standard Multi-AZ | ~€0.015/GB/month | Critical data (barman backups, active game assets) |
|
|||
|
|
| Standard One Zone | ~€0.008/GB/month | Less critical replicas |
|
|||
|
|
| Glacier | ~€0.003/GB/month | Deep archive (old builds, historical assets) |
|
|||
|
|
| Egress | 75 GB free/month, then €0.01/GB | |
|
|||
|
|
| Requests + Ingress | Included | |
|
|||
|
|
|
|||
|
|
**Estimated replication cost:** 100 GB on Multi-AZ ≈ €1.50/month. Even 500 GB Multi-AZ ≈ €7.50/month. Glacier for deep archive of old builds is essentially free.
|
|||
|
|
|
|||
|
|
### 12.2 Code Backup — GitHub Mirror
|
|||
|
|
|
|||
|
|
All Gitea repositories are mirrored daily to private GitHub repos as an offsite code backup. This is **code only** — Git LFS objects are excluded (covered by SeaweedFS → Scaleway Object Storage replication above).
|
|||
|
|
|
|||
|
|
**Implementation:** Gitea Actions cron job, runs nightly at 03:00 UTC.
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# .gitea/workflows/github-mirror.yaml (placed in each repo)
|
|||
|
|
name: Mirror to GitHub
|
|||
|
|
on:
|
|||
|
|
schedule:
|
|||
|
|
- cron: '0 3 * * *'
|
|||
|
|
|
|||
|
|
jobs:
|
|||
|
|
mirror:
|
|||
|
|
runs-on: ubuntu-latest
|
|||
|
|
steps:
|
|||
|
|
- uses: actions/checkout@v4
|
|||
|
|
with:
|
|||
|
|
fetch-depth: 0
|
|||
|
|
lfs: false
|
|||
|
|
- name: Push mirror
|
|||
|
|
env:
|
|||
|
|
GITHUB_TOKEN: ${{ secrets.GITHUB_MIRROR_TOKEN }}
|
|||
|
|
run: |
|
|||
|
|
git remote add github "https://${GITHUB_TOKEN}@github.com/sunbeam-studio/${{ github.event.repository.name }}.git" 2>/dev/null || true
|
|||
|
|
git push github --all --force
|
|||
|
|
git push github --tags --force
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**GitHub org:** `sunbeam-studio` (all repos private, free tier covers unlimited private repos).
|
|||
|
|
|
|||
|
|
**Mirrored repos:** `sunbeam-infra`, `pingora-proxy`, `hive`, `game`, and any future Sunbeam repositories. **Not mirrored:** Git LFS objects (game assets, large binaries) and secrets (never in Git).
|
|||
|
|
|
|||
|
|
This gives triple redundancy on source code: Gitea on Elastic Metal, GitHub mirror, and every developer's local clone. If the server and all Scaleway backups vanish simultaneously, the code is still safe.
|
|||
|
|
|
|||
|
|
### 12.3 Backup Schedule
|
|||
|
|
|
|||
|
|
| What | Method | Destination | Frequency | Retention |
|
|||
|
|
|---|---|---|---|---|
|
|||
|
|
| PostgreSQL (all DBs) | CloudNativePG barmanObjectStore | Scaleway Object Storage (Multi-AZ) | Continuous WAL + daily base | 30 days PITR, 90 days base |
|
|||
|
|
| SeaweedFS (all buckets) | Nightly sync to Scaleway Object Storage | Scaleway Object Storage (One Zone) | Nightly | 30 days |
|
|||
|
|
| Git repositories (code) | Gitea Actions → GitHub mirror | GitHub (`sunbeam-studio` org, private) | Nightly 03:00 UTC | Indefinite |
|
|||
|
|
| Git repositories (local) | Distributed by nature (every clone) | Developer machines | Every push | Indefinite |
|
|||
|
|
| Git LFS objects | In SeaweedFS → covered by SeaweedFS sync | Scaleway Object Storage | Per SeaweedFS schedule | 30 days |
|
|||
|
|
| Cluster config (manifests, Helm values) | Committed to Gitea (mirrored to GitHub) | Distributed + GitHub | Every commit | Indefinite |
|
|||
|
|
| Ory config | Committed to Gitea (secrets via Sealed Secrets or Scaleway Secret Manager) | Distributed + GitHub | Every commit | Indefinite |
|
|||
|
|
| Pingora config | Committed to Gitea (mirrored to GitHub) | Distributed + GitHub | Every commit | Indefinite |
|
|||
|
|
|
|||
|
|
**Monthly verification:** Restore a random database to a scratch namespace, verify integrity and app startup. Spot-check a GitHub mirror repo against Gitea (compare `git log --oneline -5` on both remotes). Automate via Gitea Actions cron job.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 13. Operational Runbooks
|
|||
|
|
|
|||
|
|
### 13.1 Add a New User
|
|||
|
|
|
|||
|
|
1. Create identity in Kratos (via People UI or Kratos admin API)
|
|||
|
|
2. People propagates permissions to La Suite apps
|
|||
|
|
3. Messages provisions personal mailbox (`name@sunbeam.pt`)
|
|||
|
|
4. Gitea account auto-provisions on first OIDC login
|
|||
|
|
5. User visits any `*.sunbeam.pt` URL, authenticates once, has access everywhere
|
|||
|
|
|
|||
|
|
### 13.2 Deploy a New La Suite Component
|
|||
|
|
|
|||
|
|
1. Create logical database in CloudNativePG
|
|||
|
|
2. Create S3 bucket in SeaweedFS
|
|||
|
|
3. Register OIDC client in Hydra (ID, secret, redirect URIs)
|
|||
|
|
4. Deploy to `lasuite` namespace with standard env vars:
|
|||
|
|
- `DJANGO_DATABASE_URL`, `AWS_S3_ENDPOINT_URL`, `AWS_S3_BUCKET_NAME`
|
|||
|
|
- `OIDC_RP_CLIENT_ID`, `OIDC_RP_CLIENT_SECRET`
|
|||
|
|
- `AI_BASE_URL`, `AI_API_KEY`, `AI_MODEL`
|
|||
|
|
5. Add hostname route in Pingora
|
|||
|
|
6. Verify auth flow, S3 access, AI connectivity
|
|||
|
|
|
|||
|
|
### 13.3 Restore PostgreSQL from Backup
|
|||
|
|
|
|||
|
|
**Full cluster:** CloudNativePG bootstraps new cluster from barman backup in Scaleway Object Storage. Specify `recoveryTarget.targetTime` for PITR. Verify integrity, swap service endpoints.
|
|||
|
|
|
|||
|
|
**Single database:** `pg_dump` from recovered cluster → `pg_restore` into production.
|
|||
|
|
|
|||
|
|
### 13.4 Recover from Elastic Metal Failure
|
|||
|
|
|
|||
|
|
1. Provision new Elastic Metal instance
|
|||
|
|
2. Install k3s, deploy Linkerd
|
|||
|
|
3. Restore CloudNativePG from barman (Scaleway Object Storage)
|
|||
|
|
4. Restore SeaweedFS data from Scaleway Object Storage replicas
|
|||
|
|
5. Re-deploy all manifests from Gitea (every developer has a clone)
|
|||
|
|
6. Update DNS A records to new IP
|
|||
|
|
7. Update PTR record in Scaleway console
|
|||
|
|
8. Verify OIDC, email, TURN, AI connectivity
|
|||
|
|
|
|||
|
|
### 13.5 Troubleshoot LiveKit TURN
|
|||
|
|
|
|||
|
|
Symptoms: Users connect to Meet but have no audio/video.
|
|||
|
|
|
|||
|
|
1. Verify UDP 3478 + 49152–49252 reachable from outside
|
|||
|
|
2. Check Pingora UDP forwarding is active
|
|||
|
|
3. Check LiveKit logs for TURN allocation failures
|
|||
|
|
4. Verify Elastic Metal firewall rules
|
|||
|
|
5. Test with external STUN/TURN tester
|
|||
|
|
|
|||
|
|
### 13.6 Certificate Renewal Failure
|
|||
|
|
|
|||
|
|
1. Check Pingora logs for ACME errors
|
|||
|
|
2. Verify port 80 reachable for HTTP-01 challenge (or DNS-01 if configured)
|
|||
|
|
3. Restart Pingora to force `rustls-acme` renewal retry
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 14. Maintenance Schedule
|
|||
|
|
|
|||
|
|
### Weekly
|
|||
|
|
- Check CloudNativePG backup status (latest successful timestamp)
|
|||
|
|
- Glance at Linkerd dashboard for error rate anomalies
|
|||
|
|
- Review Scaleway billing for unexpected charges
|
|||
|
|
|
|||
|
|
### Monthly
|
|||
|
|
- Apply k3s patch releases if available
|
|||
|
|
- Check suitenumerique GitHub for new La Suite releases, review changelogs
|
|||
|
|
- Update container images one at a time, verify after each
|
|||
|
|
- Review SeaweedFS storage utilization
|
|||
|
|
- Run backup restore test (random database → scratch namespace)
|
|||
|
|
|
|||
|
|
### Quarterly
|
|||
|
|
- **La Suite upstream sync:** Test new releases in local Docker Compose before deploying. One component at a time.
|
|||
|
|
- **Ory updates:** Kratos/Hydra migrations may involve schema changes. Always backup first.
|
|||
|
|
- **Linkerd updates:** Follow upgrade guide. Data plane sidecars roll automatically.
|
|||
|
|
- **Security audit:** Review exposed ports, DNS, TLS config. Run `testssl.sh` against all endpoints. Check CVEs in deployed images.
|
|||
|
|
- **Storage rebalance:** Evaluate SeaweedFS vs Scaleway Object Storage split. Move cold game assets to Scaleway if NVMe is filling.
|
|||
|
|
- **AI model review:** Check Scaleway for new models. Evaluate cost/performance. Test in Conversations before switching.
|
|||
|
|
|
|||
|
|
### Annually
|
|||
|
|
- Review Elastic Metal spec — more RAM, more disk?
|
|||
|
|
- Evaluate new La Suite components
|
|||
|
|
- Domain renewal for `sunbeam.pt`
|
|||
|
|
- Full disaster recovery drill: simulate Elastic Metal loss, restore everything to a fresh instance from backups
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 15. Cost Estimate
|
|||
|
|
|
|||
|
|
| Item | Monthly |
|
|||
|
|
|---|---|
|
|||
|
|
| Scaleway Elastic Metal (64GB, NVMe) | ~€80–120 |
|
|||
|
|
| Scaleway Object Storage (backups + replication) | ~€2–10 |
|
|||
|
|
| Scaleway Transactional Email | ~€1 |
|
|||
|
|
| Scaleway Generative APIs | ~€1–5 |
|
|||
|
|
| Domain (amortized) | ~€2 |
|
|||
|
|
| **Total** | **~€86–138** |
|
|||
|
|
|
|||
|
|
For comparison: Google Workspace (€12/user × 3) + Zoom (€13) + Notion (€8/user × 3) + GitHub Team (€4/user × 3) + Linear (€8/user × 3) + email hosting ≈ €130+/month — with no data control, no customization, per-seat scaling.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 16. Architecture Diagram (Text)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Internet
|
|||
|
|
│
|
|||
|
|
┌──────────┴──────────┐
|
|||
|
|
│ Pingora Edge │
|
|||
|
|
│ HTTPS + WS + UDP │
|
|||
|
|
└──────────┬──────────┘
|
|||
|
|
│
|
|||
|
|
┌──────────┴──────────┐
|
|||
|
|
│ Linkerd mTLS mesh │
|
|||
|
|
└──────────┬──────────┘
|
|||
|
|
│
|
|||
|
|
┌────────┬───────┬───┴───┬────────┬────────┐
|
|||
|
|
│ │ │ │ │ │
|
|||
|
|
┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌───┴──┐ ┌──┴──┐
|
|||
|
|
│Docs │ │Meet │ │Drive│ │Msgs │ │Convos│ │Gitea│
|
|||
|
|
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └───┬──┘ └──┬──┘
|
|||
|
|
│ │ │ │ │ │
|
|||
|
|
│ ┌──┴──┐ │ ┌──┴──┐ │ │
|
|||
|
|
│ │Live │ │ │Post │ │ │
|
|||
|
|
│ │Kit │ │ │fix │ │ │
|
|||
|
|
│ └─────┘ │ └─────┘ │ │
|
|||
|
|
│ │ │ │
|
|||
|
|
│ ┌──┴──┐ │ │
|
|||
|
|
│ │Hive │ ◄── sync ──►│ │
|
|||
|
|
│ └──┬──┘ │ │
|
|||
|
|
│ │ │ │
|
|||
|
|
┌─────┴───────────────┴────────────────┴───────┴─────┐
|
|||
|
|
│ │
|
|||
|
|
┌───┴────┐ ┌─────────┐ ┌───────┐ ┌──────────────────┐ │
|
|||
|
|
│Postgres│ │SeaweedFS│ │ Redis │ │ OpenSearch │ │
|
|||
|
|
│ (CNPG) │ │ (S3) │ │ │ │ │ │
|
|||
|
|
└────────┘ └─────────┘ └───────┘ └──────────────────┘ │
|
|||
|
|
│ │
|
|||
|
|
│ ┌──────────────────────┐ │
|
|||
|
|
│ │ Ory Kratos/Hydra │◄───── all apps ────┘
|
|||
|
|
│ │ (auth.sunbeam.*) │ via OIDC
|
|||
|
|
│ └──────────────────────┘
|
|||
|
|
│
|
|||
|
|
└──── barman ──── Scaleway Object Storage (backups)
|
|||
|
|
|
|||
|
|
Scaleway Generative APIs (AI)
|
|||
|
|
▲
|
|||
|
|
│ HTTPS
|
|||
|
|
└── Docs, Messages, Conversations
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 17. Open Questions
|
|||
|
|
|
|||
|
|
- **Game build pipeline details** — Gitea Actions handles lightweight CI (compiles, tests, linting). Platform-specific builds (console SDKs, platform cert signing) offloaded to external providers. All build artifacts land in SeaweedFS. Exact pipeline TBD as game toolchain solidifies.
|
|||
|
|
- **Drive REST API surface** — Hive's Drive client depends on Drive's exact file list/upload/download endpoints. Need to read Drive source to confirm: pagination strategy, file version handling, multipart upload support, how folder hierarchy is represented in API responses.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Appendix: Repository References
|
|||
|
|
|
|||
|
|
| Component | Repository | License |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Docs | `github.com/suitenumerique/docs` | MIT |
|
|||
|
|
| Meet | `github.com/suitenumerique/meet` | MIT |
|
|||
|
|
| Drive | `github.com/suitenumerique/drive` | MIT |
|
|||
|
|
| Messages | `github.com/suitenumerique/messages` | MIT |
|
|||
|
|
| Conversations | `github.com/suitenumerique/conversations` | MIT |
|
|||
|
|
| People | `github.com/suitenumerique/people` | MIT |
|
|||
|
|
| Integration bar | `github.com/suitenumerique/integration` | MIT |
|
|||
|
|
| Django shared lib | `github.com/suitenumerique/django-lasuite` | MIT |
|
|||
|
|
| Ory Kratos | `github.com/ory/kratos` | Apache 2.0 |
|
|||
|
|
| Ory Hydra | `github.com/ory/hydra` | Apache 2.0 |
|
|||
|
|
| SeaweedFS | `github.com/seaweedfs/seaweedfs` | Apache 2.0 |
|
|||
|
|
| CloudNativePG | `github.com/cloudnative-pg/cloudnative-pg` | Apache 2.0 |
|
|||
|
|
| Linkerd | `github.com/linkerd/linkerd2` | Apache 2.0 |
|
|||
|
|
| Pingora | `github.com/cloudflare/pingora` | Apache 2.0 |
|
|||
|
|
| Gitea | `github.com/go-gitea/gitea` | MIT |
|
|||
|
|
| LiveKit | `github.com/livekit/livekit` | Apache 2.0 |
|