Commit Graph

31 Commits

Author SHA1 Message Date
6acf598f92 refactor: remove La Suite services (except Meet + Collabora), delete local overlay
La Suite Messages, Calendars, Drive, Projects, Hive, Integration, and
Postfix are replaced by Stalwart (mail) and Tuwunel (messaging). Meet
and Collabora remain for video conferencing and document editing.

Local overlay was POC only — all deployment targets production now.

Deleted: 37 La Suite manifests, Drive Helm chart, 7 local overlay files,
stale MTA-in hostPort patch. Cleaned up production image overrides and
resource patches for removed services.
2026-04-06 18:03:55 +01:00
8662c79212 checkpoint: stalwart deploy, beam-design, migration scripts, config tweaks
Stalwart + Bulwark mail server deployment with OIDC, TLS cert, vault
secrets. Beam design service. Pingora config cleanup. SeaweedFS
replication fix. Kratos values tweak. Migration scripts for mbox/messages
/calendars from La Suite to Stalwart.
2026-04-06 17:52:30 +01:00
fcb80f1f37 feat(devtools): deploy Penpot + MCP server, wildcard TLS via DNS-01
Penpot (designer.sunbeam.pt):
- Frontend/backend/exporter deployments with OIDC-only auth via Hydra
- VSO-managed DB, S3, and app secrets from OpenBao
- PostgreSQL user/db in CNPG postInitSQL
- Hydra Maester enabledNamespaces extended to devtools

Penpot MCP server (mcp-designer.sunbeam.pt):
- Pre-built Node.js image pushed to Gitea registry
- Auth-gated via Pingora auth_request → Hydra /userinfo
- WebSocket path for browser plugin connection

Wildcard TLS:
- Switched cert-manager from HTTP-01 (per-SAN) to DNS-01 via Scaleway webhook
- Certificate collapsed to *.sunbeam.pt + sunbeam.pt
- Added scaleway-certmanager-webhook Helm chart
- VSO secret for Scaleway DNS API credentials in cert-manager namespace
- Added cert-manager to OpenBao VSO auth role
2026-04-04 12:53:27 +01:00
632099893a feat(media): deploy Element Call fork + optimize LiveKit quality
- Deploy self-hosted Element Call at call.sunbeam.pt with SSO login
- LiveKit: VP9 > AV1 > H.264 codec preferences, Opus stereo
- LiveKit: congestion_control.allow_pause=false, larger NACK buffers
- LiveKit: resources bumped to 2Gi/4CPU for VP9 SVC
- Proxy: add call.* route, TLS cert SAN for call.sunbeam.pt
2026-03-26 09:38:53 +00:00
0a322c8a7c remove: Docs (impress) and People (desk) from La Suite
Collabora stays (Drive needs it for WOPI document editing).
Removed: Helm charts, values, nginx configs, patches, OIDC clients,
Vault secrets, S3 buckets, Pingora routes, Kratos return URLs,
overlay image overrides and resource patches, local-up.sh restarts.
2026-03-25 17:53:43 +00:00
f8e9d32a8b feat(ingress): add bare domain sunbeam.pt to TLS cert SANs
Element X resolves .well-known from the server_name (sunbeam.pt),
not the homeserver URL. The bare domain needs a valid cert.
2026-03-25 13:24:38 +00:00
a086049de6 fix: harden SeaweedFS storage and fix Drive presigned uploads
- SeaweedFS filer: Recreate strategy (prevents LevelDB lock contention),
  60s termination grace period, memory 256Mi→2Gi limit
- SeaweedFS volume: 60s termination grace period, memory 256Mi→1Gi limit
- Drive: add AWS_S3_DOMAIN_REPLACE so presigned upload URLs use
  s3.sunbeam.pt instead of internal cluster DNS
- Drive: relax liveness/readiness probes (failureThreshold 1→3,
  period 1s→10s, timeout 1s→5s) to prevent crash loops under load
2026-03-22 19:48:36 +00:00
9af3cd3c49 feat: expose admin APIs behind OIDC auth_request
Adds pingora routes for id, hydra, search, vault subdomains.
Each gated by auth_request to Hydra userinfo — only valid SSO
bearer tokens pass through. Adds new SANs to the TLS certificate.
2026-03-22 18:59:22 +00:00
5f923d14f9 feat(matrix): add Sol virtual librarian deployment manifests
Sol is a Matrix bot with E2EE that archives conversations to OpenSearch
and responds via Mistral AI function calling. Adds deployment, PVC,
ConfigMap (sol.toml + system prompt), VaultStaticSecret for credentials,
and production overlay image entry.
2026-03-20 21:38:48 +00:00
bfe0280732 feat(lasuite): add Projects (Planka Kanban) service
Deploy Planka-based project management at projects.DOMAIN_SUFFIX:
- ConfigMap with OIDC, S3, SMTP, La Gaufre widget config
- Deployment + Service (init container for DB migrations, Sails on 1337)
- OAuth2Client (client_secret_basic, redirect to /oidc-callback)
- VaultDynamicSecret for DATABASE_URL, VaultStaticSecret for SECRET_KEY
- Pingora route with websocket support (Socket.io)
- Image overrides in both local and production overlays
- TLS cert dnsNames updated for projects subdomain
- Integration service.json updated with Projects entry
- seaweedfs-s3-credentials rolloutRestartTargets includes projects
2026-03-20 13:41:54 +00:00
3c7460f4a6 feat(lasuite): add calendars service deployment manifests
Add K8s manifests for calendars backend, frontend (Caddy), CalDAV
server, and Celery worker. Wire Pingora routing for cal.sunbeam.pt
with path-based backend/caldav/static splits. Add OAuth2Client for
OIDC, VaultDynamicSecret for DB credentials, VaultStaticSecret for
Django/CalDAV keys, and TLS cert coverage for the cal subdomain.
Register calendars in the integration service gaufre widget.
2026-03-18 18:36:05 +00:00
ccfe8b877a feat: La Suite email/messages, buildkitd, monitoring, vault and storage updates
- Add Messages (email) service: backend, frontend, MTA in/out, MPA, SOCKS
  proxy, worker, DKIM config, and theme customization
- Add Collabora deployment for document collaboration
- Add Drive frontend nginx config and values
- Add buildkitd namespace for in-cluster container builds
- Add SeaweedFS remote sync and additional S3 buckets
- Update vault secrets across namespaces (devtools, lasuite, media,
  monitoring, ory, storage) with expanded credential management
- Update monitoring: rename grafana→metrics OAuth2Client, add Prometheus
  remote write and additional scrape configs
- Update local/production overlays with resource patches
- Remove stale login-ui resource patch from production overlay
2026-03-10 19:00:57 +00:00
e5741c4df6 feat: integrate tuwunel with Ory SSO, rename chat to messages subdomain
- Add matrix to hydra-maester enabledNamespaces for OAuth2Client CRD
- Update allowed_return_urls and selfservice URLs: chat→messages
- Add Kratos verification flow, employee/external identity schemas
- Extend session lifespan to 30 days with persistent cookies
- Route messages.* to tuwunel via Pingora with WebSocket support
- Replace login-ui with kratos-admin-ui as unified auth frontend
- Update TLS certificate SANs: chat→messages, add monitoring subdomains
- Add tuwunel + La Suite images to production overlay
- Switch DDoS/scanner detection to compiled-in ensemble models (observe_only)
2026-03-10 18:52:47 +00:00
d2148335de feat(matrix): add tuwunel Matrix homeserver deployment manifests
Kubernetes manifests for tuwunel — a Rust Matrix homeserver using RocksDB
for storage. Includes deployment, service, PVC, ConfigMap (tuwunel.toml),
Hydra OAuth2Client for SSO, and Vault secrets for credentials injection.

Key design decisions:
- enableServiceLinks: false to prevent K8s TUWUNEL_* env var conflicts
- strategy: Recreate for RocksDB exclusive lock (no rolling updates)
- Identity provider configured entirely via env vars (client_id/secret
  from hydra-maester Secret, not hardcoded)
- OpenSearch model_id injected via ConfigMap from CLI post-apply hook
- SSO-only auth (login_with_password=false, single_sso=true)
- OpenSearch hybrid neural+BM25 search (768-dim, all-mpnet-base-v2)
2026-03-10 18:52:21 +00:00
f3faf31d4b Fix meet: ALLOWED_HOSTS, OIDC callback, and LiveKit connectivity
- meet-config: rename ALLOWED_HOSTS → DJANGO_ALLOWED_HOSTS (django-configurations
  ListValue uses DJANGO_ prefix by default; without it the list was empty and
  every browser request got 400 DisallowedHost)
- meet-config: set LIVEKIT_API_URL to public https://livekit.DOMAIN_SUFFIX so
  the meet frontend can reach LiveKit for WebSocket signaling
- pingora-config: add livekit.DOMAIN_SUFFIX → livekit-server:80 WebSocket route
- cert-manager: add livekit.DOMAIN_SUFFIX to TLS cert dnsNames
- oidc-clients: fix meet redirect URI /oidc/callback/ → /api/v1.0/callback/
  (meet embeds mozilla-django-oidc inside the api/v1.0/ prefix); add
  postLogoutRedirectUri for clean logout
- livekit-values: replace hardcoded devkey:secret-placeholder with key_file
  loaded from a VSO-managed K8s Secret (secret/livekit in OpenBao)
- media/vault-secrets: add VaultAuth + VaultStaticSecret for media namespace
  to sync livekit API credentials from OpenBao
2026-03-06 13:56:29 +00:00
1d01a1411a chore(infra): remove values-pingora.yaml (superseded by patch-pingora-hostport.yaml) 2026-03-06 12:10:26 +00:00
424db43ccf feat(infra): Meet integration, La Suite theming, Pingora SSH + meet routes
Meet: add backend/frontend/celery deployments and services, meet-config
ConfigMap, nginx SPA config, VSO secrets (meet-db-credentials VDS,
meet-django-secret and meet-livekit VSS). Wire oidc-meet OAuth2Client.

La Suite overlay discipline: move people/docs frontend nginx ConfigMaps
and patches from overlays/local to base so both environments share them.
Remove values-ory.yaml (folded into base). Add docs-frontend nginx config
with sub_filter theming. Add local gitea mkcert CA patch.

Pingora: add [ssh] TCP passthrough block (port 22 → Gitea SSH pod) and
split meet route into frontend default + backend paths for /api/, /admin/,
/oidc/, /static/, /__. Remove now-unused values-pingora.yaml from production
overlay (host ports moved to patch-pingora-hostport.yaml).

Update both overlay kustomizations to reference all new resources and
add meet-backend/meet-frontend image entries.
2026-03-06 12:08:21 +00:00
7ff35d3e0c feat(infra): production bootstrap — cert-manager, longhorn, monitoring
Add new bases for cert-manager (Let's Encrypt + wildcard cert), Longhorn
distributed storage, and monitoring (kube-prometheus-stack + Loki + Tempo
+ Grafana OIDC). Add cloud-init for Scaleway Elastic Metal provisioning.

Production overlay: add patches for postgres sizing, SeaweedFS volume,
OpenSearch storage, LiveKit service, Pingora host ports, resource limits,
and CNPG daily barman backups. Update cert-manager.yaml with full dnsNames
for all *.sunbeam.pt subdomains.
2026-03-06 12:06:27 +00:00
897013bcb7 feat(lasuite): migrate integration service to La Gaufre v2
Replace the inline gaufre.js/nginx.conf ConfigMap approach with a
purpose-built custom image (sunbeam/integration-service) that builds
the lagaufre.js v2 widget from the suitenumerique/integration source
and serves it via nginx.

Changes:
- Rewrite integration-deployment.yaml: custom image, v2 services.json
  format, only actually-deployed services (docs, meet, people)
- Add people-frontend nginx sub_filter overlay to rewrite the hardcoded
  production integration URL baked into the Next.js bundle at build time
- Register integration image in local overlay kustomization
2026-03-03 16:08:48 +00:00
2e89854f86 feat(lasuite): deploy La Suite Docs (impress)
Adds the impress Helm chart (suitenumerique/docs, v4.5.0) to the lasuite
namespace with full Pingora routing, VSO secrets, and local overlay
resource tuning.

Routing (pingora-config.yaml):
- docs.* frontend -> docs-frontend:80 (nginx, static Next.js export)
- /api/* and /admin/* -> docs-backend:80 (Django/uvicorn)
- /collaboration/ws/* -> docs-y-provider:4444 (Hocuspocus WebSocket)
- integration.* -> integration:80 (La Gaufre hub, same file)

Secrets (vault-secrets.yaml):
- VaultDynamicSecret docs-db-credentials (DB engine, static role)
- VaultStaticSecret docs-django-secret (DJANGO_SECRET_KEY)
- VaultStaticSecret docs-collaboration-secret (y-provider shared secret)

OIDC client (oidc-clients.yaml):
- Fix redirect_uri from /oidc/callback/ to /api/v1.0/callback/ -- impress
  mounts all OIDC URLs under api/{API_VERSION}/ via lasuite.oidc_login,
  same pattern as people.

Local overlay (values-resources.yaml):
- docs-backend: 512Mi limit, WEB_CONCURRENCY=2 (4 uvicorn workers
  exceeded 384Mi at startup on the arm64 Lima VM)
- docs-celery-worker: 384Mi limit, CELERY_WORKER_CONCURRENCY=2
- docs-y-provider: 256Mi limit
- seaweedfs-filer: raised from 256Mi to 512Mi (OOMKilled during 188MB
  multipart S3 upload of impress-y-provider image layer)

Local overlay (kustomization.yaml):
- Image mirrors for impress-backend, impress-frontend, impress-y-provider
  (amd64-only images retagged to Gitea via cmd_mirror before deploy)
2026-03-03 14:30:45 +00:00
3fc3011e61 chore(local): scale Linkerd-injected deployments to 1 replica
Local Lima VM (12 GB) doesn't need HA replicas. Each extra pod with a
Linkerd sidecar wastes ~64 MB. Scale people-backend, people-celery-worker,
and people-frontend to 1 replica each.
2026-03-03 11:31:41 +00:00
f13beed1c4 fix(lasuite): fix OIDC config for People login
- Switch all user-facing app OAuth2 clients to client_secret_post
  (mozilla-django-oidc sends credentials in POST body by default)
- Set LOGIN_REDIRECT_URL=/ so Django redirects to frontend after login
- Add local overlay patch to disable OIDC SSL verification
  (mkcert CA not trusted inside pods; production uses real certs)
2026-03-03 11:31:28 +00:00
3ecb42056f chore: replace sunbeam.py with cli package; fix VSO test RBAC
Remove scripts/sunbeam.py — superseded by the new cli/ package.
Add install/test/sunbeam targets to justfile pointing at ../cli/.

fix(vso): add deletecollection to test-rbac Role — CachingClientFactory
calls deletecollection on secrets during init; the old Role only had
delete, causing vault-secrets-operator-test to CrashLoopBackOff.

fix(ingress): pingora imagePullPolicy IfNotPresent — Always caused
unnecessary pulls on every pod restart in local dev.
2026-03-02 21:01:03 +00:00
cc2d4e6cbd feat(local): pull sunbeam-proxy from Gitea registry; switch to imagePullPolicy Always
Image is now built and pushed by `sunbeam.py --build` rather than imported
directly into k3s containerd. imagePullPolicy changes from Never to Always
so every rollout restart pulls the freshly pushed image.
2026-03-02 18:54:56 +00:00
7de6e94a8d fix: resource tuning — LiveKit Recreate strategy, OpenSearch JVM heap, login-ui
LiveKit: switch to Recreate deployment strategy. hostPorts (TURN UDP relay
range) block RollingUpdate because the new pod cannot schedule while the
old one still holds the ports.

OpenSearch: set OPENSEARCH_JAVA_OPTS to -Xms192m -Xmx256m. The upstream
default (-Xms512m -Xmx1g) immediately OOMs the container given our 512Mi
memory limit.

login-ui: raise memory limit from 64Mi to 192Mi and add a 64Mi request;
the previous limit was too tight and caused OOMKilled restarts under load.
2026-03-02 18:33:42 +00:00
e3336ff2a9 feat(vso): deploy Vault Secrets Operator; add test RBAC + amd64 image aliases
- Add base/vso/ with Helm chart (v0.9.0 from helm.releases.hashicorp.com),
  namespace, and test-rbac.yaml granting the Helm test pod's default SA
  permission to create/read/delete Secrets, ConfigMaps, and Leases so the
  bundled connectivity test passes.
- Wire ../../base/vso into overlays/local/kustomization.yaml.
- Add image aliases for lasuite/people-backend and lasuite/people-frontend
  so kustomize rewrites those pulls to our Gitea registry (amd64-only images
  that are patched and mirrored by sunbeam.py).
2026-03-02 18:31:50 +00:00
cdddc334ff feat: replace nginx placeholder with custom Pingora proxy; add Postfix MTA
Ingress:
- Deploy custom sunbeam-proxy (Pingora/Rust) replacing nginx placeholder
- HTTPS termination with mkcert (local) / rustls-acme (production)
- Host-prefix routing with path-based sub-routing for auth virtual host:
  /oauth2 + /.well-known + /userinfo → Hydra, /kratos → Kratos (prefix stripped), default → login-ui
- HTTP→HTTPS redirect, WebSocket passthrough, JSON audit logging, OTEL stub
- cert-manager HTTP-01 ACME challenge routing via Ingress watcher
- RBAC for Ingress watcher (pingora-watcher ClusterRole)
- local overlay: hostPorts 80/443, LiveKit TURN demoted to ClusterIP to avoid klipper conflict

Infrastructure:
- socket_vmnet shared network for host↔VM reachability (192.168.105.2)
- local-up.sh: cert-manager installation, eth1-based LIMA_IP detection, correct DOMAIN_SUFFIX sed substitution
- Postfix MTA in lasuite namespace: outbound relay via Scaleway TEM, accepts SMTP from cluster pods
- Kratos SMTP courier pointed at postfix.lasuite.svc.cluster.local:25
- Production overlay: cert-manager ClusterIssuer, ACME-enabled Pingora values
2026-03-01 16:25:11 +00:00
a589e6280d feat: bring up local dev stack — all services running
- Ory Hydra + Kratos: fixed secret management, DSN config, DB migrations,
  OAuth2Client CRD (helm template skips crds/ dir), login-ui env vars
- SeaweedFS: added s3.json credentials file via -s3.config CLI flag
- OpenBao: standalone mode with auto-unseal sidecar, keys in K8s secret
- OpenSearch: increased memory to 1.5Gi / JVM 1g heap
- Gitea: SSL_MODE disable, S3 bucket creation fixed
- Hive: automountServiceAccountToken: false (Lima virtiofs read-only rootfs quirk)
- LiveKit: API keys in values, hostPort conflict resolved
- Linkerd: native sidecar (proxy.nativeSidecar=true) to avoid blocking Jobs
- All placeholder images replaced: pingora→nginx:alpine, login-ui→oryd/kratos-selfservice-ui-node

Full stack running: postgres, valkey, openbao, opensearch, seaweedfs,
kratos, hydra, gitea, livekit, hive (placeholder), login-ui
2026-02-28 22:08:38 +00:00
92e80a761c fix(ory): re-enable hydra-maester, fix namespace, add memory limit 2026-02-28 14:02:47 +00:00
886c4221b2 fix(local): kustomize render passes cleanly
- Remove base/mesh from local overlay (Linkerd installed via CLI in local-up.sh)
- Fix LiveKit namespace: chart doesn't set .Release.Namespace, add explicit patches
- Fix release names: livekit-server and cloudnative-pg match chart names (avoid double-prefix)
- Disable hydra-maester (not needed for local dev)
- Add memory limits for cloudnative-pg operator and livekit-server deployments
- Remove non-functional values-ory.yaml patch (DOMAIN_SUFFIX handled by sed in local-up.sh)
- Gitignore **/charts/ (kustomize helm cache, generated artifact)
2026-02-28 14:00:31 +00:00
5d9bd7b067 chore: initial infrastructure scaffold
Kustomize base + overlays for the full Sunbeam k3s stack:
- base/mesh      — Linkerd edge (crds + control-plane + viz)
- base/ingress   — custom Pingora edge proxy
- base/ory       — Kratos 0.60.1 + Hydra 0.60.1 + login-ui
- base/data      — CloudNativePG 0.27.1, Valkey 8, OpenSearch 2
- base/storage   — SeaweedFS master + volume + filer (S3 on :8333)
- base/lasuite   — Hive sync daemon + La Suite app placeholders
- base/media     — LiveKit livekit-server 1.9.0
- base/devtools  — Gitea 12.5.0 (external PG + Valkey)
overlays/local   — sslip.io domain, mkcert TLS, Lima hostPort
overlays/production — stub (TODOs for sunbeam.pt values)
scripts/         — local-up/down/certs/urls helpers
justfile         — up / down / certs / urls targets
2026-02-28 13:42:27 +00:00