Full tour of the SBBB✨ stack: Pingora proxy, Ory identity, La Suite
apps, Linkerd mesh, OpenBao secrets, data layer, monitoring, Matrix,
Sol☀️, and the platform itself.
Full rewrite with boujee tone — app inventory, architecture diagram,
custom components (Sol☀️, Pingora, Sunbeam CLI), team bios, and links
to the new documentation suite.
Root token and unseal key were lost when a placeholder manifest
overwrote the openbao-keys Secret. Documents root cause, timeline,
5 whys, remediation actions, and monitoring requirements.
- SearXNG deployment in data namespace (free, no-tracking web search)
- sol-config: SearXNG URL, research config, identity agent, updated
system prompt (DM search rules, research mode, silence, hard rules)
- sol-deployment: debug logging (RUST_LOG=sol=debug), full image path
- opensearch: tolerate missing prometheus-exporter plugin on OS 3
Adds pingora routes for id, hydra, search, vault subdomains.
Each gated by auth_request to Hydra userinfo — only valid SSO
bearer tokens pass through. Adds new SANs to the TLS certificate.
sol-config: added [vault] and [services.gitea] sections, fetch
allowlist (wttr.in, open-meteo, github), bumped context windows
to 200, updated system prompt with run_script docs and tool rules.
sol-deployment: added gitea admin credential env vars from
sol-secrets, automountServiceAccountToken for vault k8s auth.
vault-secrets: added gitea-admin-username and gitea-admin-password
templates to sol-secrets VSS.
- Prometheus: discover ServiceMonitors/PodMonitors in all namespaces,
enable remote write receiver for Tempo metrics generator
- Tempo: enable metrics generator (service-graphs + span-metrics)
with remote write to Prometheus
- Loki: add Grafana Alloy DaemonSet to ship container logs
- Grafana: enable dashboard sidecar, add Pingora/Loki/Tempo/OpenBao
dashboards, add stable UIDs and cross-linking between datasources
(Loki↔Tempo derived fields, traces→logs, traces→metrics, service map)
- Linkerd: enable proxy tracing to Alloy OTLP collector, point
linkerd-viz at existing Prometheus instead of deploying its own
- Pingora: add OTLP rollout plan (endpoint commented out until proxy
telemetry panic fix is deployed and Alloy is verified healthy)
Sol is a Matrix bot with E2EE that archives conversations to OpenSearch
and responds via Mistral AI function calling. Adds deployment, PVC,
ConfigMap (sol.toml + system prompt), VaultStaticSecret for credentials,
and production overlay image entry.
Deploy Planka-based project management at projects.DOMAIN_SUFFIX:
- ConfigMap with OIDC, S3, SMTP, La Gaufre widget config
- Deployment + Service (init container for DB migrations, Sails on 1337)
- OAuth2Client (client_secret_basic, redirect to /oidc-callback)
- VaultDynamicSecret for DATABASE_URL, VaultStaticSecret for SECRET_KEY
- Pingora route with websocket support (Socket.io)
- Image overrides in both local and production overlays
- TLS cert dnsNames updated for projects subdomain
- Integration service.json updated with Projects entry
- seaweedfs-s3-credentials rolloutRestartTargets includes projects
Enable TOTP, WebAuthn, and lookup secret MFA methods in Kratos config.
Fix Monaspace Neon font CDN URL in Gitea theme ConfigMap. Remove
redundant Google Fonts preconnect from people-frontend nginx config.
Delete unused login-ui-deployment.yaml (login-ui is part of the Ory
Helm chart, not a standalone deployment).
Add K8s manifests for calendars backend, frontend (Caddy), CalDAV
server, and Celery worker. Wire Pingora routing for cal.sunbeam.pt
with path-based backend/caldav/static splits. Add OAuth2Client for
OIDC, VaultDynamicSecret for DB credentials, VaultStaticSecret for
Django/CalDAV keys, and TLS cert coverage for the cal subdomain.
Register calendars in the integration service gaufre widget.
- Add matrix to hydra-maester enabledNamespaces for OAuth2Client CRD
- Update allowed_return_urls and selfservice URLs: chat→messages
- Add Kratos verification flow, employee/external identity schemas
- Extend session lifespan to 30 days with persistent cookies
- Route messages.* to tuwunel via Pingora with WebSocket support
- Replace login-ui with kratos-admin-ui as unified auth frontend
- Update TLS certificate SANs: chat→messages, add monitoring subdomains
- Add tuwunel + La Suite images to production overlay
- Switch DDoS/scanner detection to compiled-in ensemble models (observe_only)
- Upgrade from OpenSearch 2 to 3 (required for ML Commons pre-trained models)
- Rename PLUGINS_SECURITY_DISABLED → DISABLE_SECURITY_PLUGIN (OS3 change)
- Enable ML Commons plugin settings for on-data-node inference
- Increase memory limits (2Gi) and JVM heap for neural model inference
- Add fsGroup security context for volume permissions
Kubernetes manifests for tuwunel — a Rust Matrix homeserver using RocksDB
for storage. Includes deployment, service, PVC, ConfigMap (tuwunel.toml),
Hydra OAuth2Client for SSO, and Vault secrets for credentials injection.
Key design decisions:
- enableServiceLinks: false to prevent K8s TUWUNEL_* env var conflicts
- strategy: Recreate for RocksDB exclusive lock (no rolling updates)
- Identity provider configured entirely via env vars (client_id/secret
from hydra-maester Secret, not hardcoded)
- OpenSearch model_id injected via ConfigMap from CLI post-apply hook
- SSO-only auth (login_with_password=false, single_sso=true)
- OpenSearch hybrid neural+BM25 search (768-dim, all-mpnet-base-v2)
- Set otlp_endpoint to Tempo HTTP receiver (port 4318) for request tracing
- Add hostNetwork to prometheusSpec so it can reach kubelet/node-exporter on node public IP
- Add ServiceMonitor for proxy metrics scrape on port 9090
- Add CORS origin and Grafana datasource config for monitoring stack
- Add DDoS, scanner, and rate limiter configuration to pingora-config
- Add kubernetes config section with configurable namespace/resource names
- Expose metrics port 9090 on deployment and service
- meet-config: rename ALLOWED_HOSTS → DJANGO_ALLOWED_HOSTS (django-configurations
ListValue uses DJANGO_ prefix by default; without it the list was empty and
every browser request got 400 DisallowedHost)
- meet-config: set LIVEKIT_API_URL to public https://livekit.DOMAIN_SUFFIX so
the meet frontend can reach LiveKit for WebSocket signaling
- pingora-config: add livekit.DOMAIN_SUFFIX → livekit-server:80 WebSocket route
- cert-manager: add livekit.DOMAIN_SUFFIX to TLS cert dnsNames
- oidc-clients: fix meet redirect URI /oidc/callback/ → /api/v1.0/callback/
(meet embeds mozilla-django-oidc inside the api/v1.0/ prefix); add
postLogoutRedirectUri for clean logout
- livekit-values: replace hardcoded devkey:secret-placeholder with key_file
loaded from a VSO-managed K8s Secret (secret/livekit in OpenBao)
- media/vault-secrets: add VaultAuth + VaultStaticSecret for media namespace
to sync livekit API credentials from OpenBao
Meet: add backend/frontend/celery deployments and services, meet-config
ConfigMap, nginx SPA config, VSO secrets (meet-db-credentials VDS,
meet-django-secret and meet-livekit VSS). Wire oidc-meet OAuth2Client.
La Suite overlay discipline: move people/docs frontend nginx ConfigMaps
and patches from overlays/local to base so both environments share them.
Remove values-ory.yaml (folded into base). Add docs-frontend nginx config
with sub_filter theming. Add local gitea mkcert CA patch.
Pingora: add [ssh] TCP passthrough block (port 22 → Gitea SSH pod) and
split meet route into frontend default + backend paths for /api/, /admin/,
/oidc/, /static/, /__. Remove now-unused values-pingora.yaml from production
overlay (host ports moved to patch-pingora-hostport.yaml).
Update both overlay kustomizations to reference all new resources and
add meet-backend/meet-frontend image entries.
Add new bases for cert-manager (Let's Encrypt + wildcard cert), Longhorn
distributed storage, and monitoring (kube-prometheus-stack + Loki + Tempo
+ Grafana OIDC). Add cloud-init for Scaleway Elastic Metal provisioning.
Production overlay: add patches for postgres sizing, SeaweedFS volume,
OpenSearch storage, LiveKit service, Pingora host ports, resource limits,
and CNPG daily barman backups. Update cert-manager.yaml with full dnsNames
for all *.sunbeam.pt subdomains.
The impress chart renders this Job unconditionally (no if-enabled guard),
then auto-deletes it after 30s (ttlSecondsAfterFinished). Each sunbeam
apply recreated it and it failed because no superuser credentials are set
(users authenticate via OIDC). Override the command to true so the Job
exits 0 immediately and disappears cleanly.
- Fix Hydra postLogoutRedirectUris for docs and people to match the
actual URI sent by mozilla_django_oidc v5 (/api/v1.0/logout-callback/)
instead of the root URL, resolving 599 logout errors.
- Fix docs y-provider WebSocket backend port: use Service port 443
(not pod port 4444 which has no DNAT rule) in Pingora config.
- Tighten VSO VaultDynamicSecret rotation sync: add allowStaticCreds:true
and reduce refreshAfter from 1h to 5m across all static-creds paths
(kratos, hydra, gitea, hive, people, docs) so credential rotation is
reflected within 5 minutes instead of up to 1 hour.
- Set Hydra token TTLs: access_token and id_token to 5m; refresh_token
to 720h (30 days). Kratos session carries silent re-auth so the short
access token TTL does not require users to log in manually.
- Set SESSION_COOKIE_AGE=3600 (1h) in docs and people backends. After
1h, apps silently re-auth via the active Kratos session. Disabled
identities (sunbeam user disable) cannot re-auth on next expiry.
Replace the inline gaufre.js/nginx.conf ConfigMap approach with a
purpose-built custom image (sunbeam/integration-service) that builds
the lagaufre.js v2 widget from the suitenumerique/integration source
and serves it via nginx.
Changes:
- Rewrite integration-deployment.yaml: custom image, v2 services.json
format, only actually-deployed services (docs, meet, people)
- Add people-frontend nginx sub_filter overlay to rewrite the hardcoded
production integration URL baked into the Next.js bundle at build time
- Register integration image in local overlay kustomization
Django backends call the OIDC token, userinfo, and JWKS endpoints
server-side. Pointing these at the public auth.DOMAIN_SUFFIX URL caused
an SSLError in pods because mkcert CA certificates are not trusted inside
containers.
Split the configmap entries:
- OIDC_OP_AUTHORIZATION_ENDPOINT and OIDC_OP_LOGOUT_ENDPOINT remain as
public HTTPS URLs -- the browser navigates to these.
- OIDC_OP_TOKEN_ENDPOINT, OIDC_OP_USER_ENDPOINT, OIDC_OP_JWKS_ENDPOINT
now point to http://hydra-public.ory.svc.cluster.local:4444 -- Django
calls these directly, bypassing the proxy and its TLS certificate.
Affects all La Suite apps (docs, people) that use lasuite-oidc-provider.
Deploys the suitenumerique/lasuite-integration app that serves the La
Gaufre app launcher (gaufre.js) and acts as the federation hub for the
La Suite Numérique app switching menu.
The service runs at integration.DOMAIN_SUFFIX and exposes
/api/v1/gaufre.js — referenced by docs, people, and other La Suite
apps via GAUFREJS_URL to render the unified app switcher.
Local Lima VM (12 GB) doesn't need HA replicas. Each extra pod with a
Linkerd sidecar wastes ~64 MB. Scale people-backend, people-celery-worker,
and people-frontend to 1 replica each.
- Switch all user-facing app OAuth2 clients to client_secret_post
(mozilla-django-oidc sends credentials in POST body by default)
- Set LOGIN_REDIRECT_URL=/ so Django redirects to frontend after login
- Add local overlay patch to disable OIDC SSL verification
(mkcert CA not trusted inside pods; production uses real certs)
- Add oauth2_provider.url pointing to hydra-admin so login_challenge
params are accepted (fixes People OIDC login flow)
- Scope session cookie to parent DOMAIN_SUFFIX so admin.* subdomains
share the session (fixes redirect loop on kratos-admin-ui)
- Add allowed_return_urls for admin.*, enable recovery flow, add error
and recovery ui_url entries
- Fix KRATOS_PUBLIC_URL port in login-ui deployment (4433 → 80)
Deploy the custom Kratos admin UI (Deno/Hono + Cunningham React):
- K8s Deployment + Service in ory namespace
- VSO VaultStaticSecret for cookie/csrf/admin-identity-ids secrets
- Pingora route for admin.DOMAIN_SUFFIX
- oidc-clients.yaml: change people redirect URI from /oidc/callback/ to
/api/v1.0/callback/ (the actual path the Django app registers)
- people-values.yaml: set DJANGO_CONFIGURATION=Production so Django trusts
X-Forwarded-Proto from Pingora and generates https:// URLs; add
ALLOWED_HOSTS and DJANGO_CSRF_TRUSTED_ORIGINS for the people subdomain
people.* now routes / to people-frontend (nginx/React SPA).
Path prefixes /api/, /admin/, and /o/ are forwarded to people-backend
(Django/gunicorn), matching the app's URL structure.
Previously all people.* traffic hit people-backend directly, causing
Django to return 404 "Page not found at /" for the root path.
The [[routes.paths]] mechanism already existed in the proxy (used by
the auth route) — only a config update was needed.
pingora-config.yaml: kratos-public and people-backend K8s Services
expose port 80, not 4433/8000. The wrong ports caused Pingora to
return timeouts for /kratos/* and all people.* routes.
ory/kustomization.yaml: remove kustomization-level namespace: ory
transformer. All non-Helm resources already declare namespace: ory
explicitly. The transformer was incorrectly moving hydra-maester's
enabledNamespaces Role (generated for the lasuite namespace) into ory,
producing a duplicate-name conflict during kustomize build.
Remove scripts/sunbeam.py — superseded by the new cli/ package.
Add install/test/sunbeam targets to justfile pointing at ../cli/.
fix(vso): add deletecollection to test-rbac Role — CachingClientFactory
calls deletecollection on secrets during init; the old Role only had
delete, causing vault-secrets-operator-test to CrashLoopBackOff.
fix(ingress): pingora imagePullPolicy IfNotPresent — Always caused
unnecessary pulls on every pod restart in local dev.
build_proxy() authenticates Docker with the Gitea registry, runs
`docker buildx build --platform linux/arm64 --push`, then applies
manifests and rolls the pingora deployment to pull the fresh image.
The Gitea admin password is read from the gitea-admin-credentials K8s
Secret (written by VSO) so --build works independently of --seed.
Image is now built and pushed by `sunbeam.py --build` rather than imported
directly into k3s containerd. imagePullPolicy changes from Never to Always
so every rollout restart pulls the freshly pushed image.
Database secrets engine (_configure_db_engine):
- Creates a dedicated `vault` PostgreSQL user via CNPG peer auth (psql exec).
CNPG enableSuperuserAccess=false blocks remote auth for the postgres
superuser, so we create vault with CREATEROLE and grant ADMIN OPTION on
each service role (required by PG 16+ to rotate passwords).
- Configures the OpenBao postgresql plugin (cnpg-postgres connection) and
creates static roles for all PG_USERS with 24h rotation_period.
- All bao/psql calls now raise RuntimeError on non-zero exit — no more
silent failures.
Credential seeding (_seed_openbao):
- Added secret/login-ui path (cookie-secret, csrf-cookie-secret) so the
login UI no longer needs hardcoded values in its Deployment manifest.
- Removed all DB password fields from KV; passwords are now managed
exclusively by the database secrets engine.
Lifecycle:
- pre_apply_cleanup() prunes stale VaultStaticSecrets that have been
superseded by VaultDynamicSecrets of the same name, preventing the
"not the owner" ownerRef conflict that blocked secret updates.
- status_check() no longer marks Completed/Succeeded pods as unhealthy.
- _vso_sync_status() added to status output: shows sync state (secretMAC
for VSS, lastRenewalTime for VDS) across all managed namespaces.
Verification (--verify):
- New verify_vso() function writes a random sentinel to OpenBao, creates
a VaultAuth + VaultStaticSecret in the ory namespace, waits up to 60s
for VSO to sync, decodes the K8s Secret, and asserts the value matches.
Cleans up all test resources unconditionally. Replaces the unreliable
Helm test pod for integration testing.