Commit Graph

27 Commits

Author SHA1 Message Date
7ffddcafcd fix(ory,lasuite): harden session security and fix logout + WebSocket routing
- Fix Hydra postLogoutRedirectUris for docs and people to match the
  actual URI sent by mozilla_django_oidc v5 (/api/v1.0/logout-callback/)
  instead of the root URL, resolving 599 logout errors.

- Fix docs y-provider WebSocket backend port: use Service port 443
  (not pod port 4444 which has no DNAT rule) in Pingora config.

- Tighten VSO VaultDynamicSecret rotation sync: add allowStaticCreds:true
  and reduce refreshAfter from 1h to 5m across all static-creds paths
  (kratos, hydra, gitea, hive, people, docs) so credential rotation is
  reflected within 5 minutes instead of up to 1 hour.

- Set Hydra token TTLs: access_token and id_token to 5m; refresh_token
  to 720h (30 days). Kratos session carries silent re-auth so the short
  access token TTL does not require users to log in manually.

- Set SESSION_COOKIE_AGE=3600 (1h) in docs and people backends. After
  1h, apps silently re-auth via the active Kratos session. Disabled
  identities (sunbeam user disable) cannot re-auth on next expiry.
2026-03-03 18:07:08 +00:00
897013bcb7 feat(lasuite): migrate integration service to La Gaufre v2
Replace the inline gaufre.js/nginx.conf ConfigMap approach with a
purpose-built custom image (sunbeam/integration-service) that builds
the lagaufre.js v2 widget from the suitenumerique/integration source
and serves it via nginx.

Changes:
- Rewrite integration-deployment.yaml: custom image, v2 services.json
  format, only actually-deployed services (docs, meet, people)
- Add people-frontend nginx sub_filter overlay to rewrite the hardcoded
  production integration URL baked into the Next.js bundle at build time
- Register integration image in local overlay kustomization
2026-03-03 16:08:48 +00:00
8113e504ba fix(lasuite): use internal cluster URLs for OIDC backend endpoints
Django backends call the OIDC token, userinfo, and JWKS endpoints
server-side. Pointing these at the public auth.DOMAIN_SUFFIX URL caused
an SSLError in pods because mkcert CA certificates are not trusted inside
containers.

Split the configmap entries:
- OIDC_OP_AUTHORIZATION_ENDPOINT and OIDC_OP_LOGOUT_ENDPOINT remain as
  public HTTPS URLs -- the browser navigates to these.
- OIDC_OP_TOKEN_ENDPOINT, OIDC_OP_USER_ENDPOINT, OIDC_OP_JWKS_ENDPOINT
  now point to http://hydra-public.ory.svc.cluster.local:4444 -- Django
  calls these directly, bypassing the proxy and its TLS certificate.

Affects all La Suite apps (docs, people) that use lasuite-oidc-provider.
2026-03-03 14:31:21 +00:00
2e89854f86 feat(lasuite): deploy La Suite Docs (impress)
Adds the impress Helm chart (suitenumerique/docs, v4.5.0) to the lasuite
namespace with full Pingora routing, VSO secrets, and local overlay
resource tuning.

Routing (pingora-config.yaml):
- docs.* frontend -> docs-frontend:80 (nginx, static Next.js export)
- /api/* and /admin/* -> docs-backend:80 (Django/uvicorn)
- /collaboration/ws/* -> docs-y-provider:4444 (Hocuspocus WebSocket)
- integration.* -> integration:80 (La Gaufre hub, same file)

Secrets (vault-secrets.yaml):
- VaultDynamicSecret docs-db-credentials (DB engine, static role)
- VaultStaticSecret docs-django-secret (DJANGO_SECRET_KEY)
- VaultStaticSecret docs-collaboration-secret (y-provider shared secret)

OIDC client (oidc-clients.yaml):
- Fix redirect_uri from /oidc/callback/ to /api/v1.0/callback/ -- impress
  mounts all OIDC URLs under api/{API_VERSION}/ via lasuite.oidc_login,
  same pattern as people.

Local overlay (values-resources.yaml):
- docs-backend: 512Mi limit, WEB_CONCURRENCY=2 (4 uvicorn workers
  exceeded 384Mi at startup on the arm64 Lima VM)
- docs-celery-worker: 384Mi limit, CELERY_WORKER_CONCURRENCY=2
- docs-y-provider: 256Mi limit
- seaweedfs-filer: raised from 256Mi to 512Mi (OOMKilled during 188MB
  multipart S3 upload of impress-y-provider image layer)

Local overlay (kustomization.yaml):
- Image mirrors for impress-backend, impress-frontend, impress-y-provider
  (amd64-only images retagged to Gitea via cmd_mirror before deploy)
2026-03-03 14:30:45 +00:00
a2f55f38f0 feat(lasuite): add La Gaufre integration service
Deploys the suitenumerique/lasuite-integration app that serves the La
Gaufre app launcher (gaufre.js) and acts as the federation hub for the
La Suite Numérique app switching menu.

The service runs at integration.DOMAIN_SUFFIX and exposes
/api/v1/gaufre.js — referenced by docs, people, and other La Suite
apps via GAUFREJS_URL to render the unified app switcher.
2026-03-03 14:28:23 +00:00
f13beed1c4 fix(lasuite): fix OIDC config for People login
- Switch all user-facing app OAuth2 clients to client_secret_post
  (mozilla-django-oidc sends credentials in POST body by default)
- Set LOGIN_REDIRECT_URL=/ so Django redirects to frontend after login
- Add local overlay patch to disable OIDC SSL verification
  (mkcert CA not trusted inside pods; production uses real certs)
2026-03-03 11:31:28 +00:00
b19e553f54 fix(ory): configure Kratos oauth2 provider, session cookie domain, and flows
- Add oauth2_provider.url pointing to hydra-admin so login_challenge
  params are accepted (fixes People OIDC login flow)
- Scope session cookie to parent DOMAIN_SUFFIX so admin.* subdomains
  share the session (fixes redirect loop on kratos-admin-ui)
- Add allowed_return_urls for admin.*, enable recovery flow, add error
  and recovery ui_url entries
- Fix KRATOS_PUBLIC_URL port in login-ui deployment (4433 → 80)
2026-03-03 11:31:00 +00:00
6cc60c66ff feat(ory): add kratos-admin-ui service
Deploy the custom Kratos admin UI (Deno/Hono + Cunningham React):
- K8s Deployment + Service in ory namespace
- VSO VaultStaticSecret for cookie/csrf/admin-identity-ids secrets
- Pingora route for admin.DOMAIN_SUFFIX
2026-03-03 11:30:52 +00:00
9092e2711b fix(lasuite): configure people for Production Django settings and correct OIDC redirect URI
- oidc-clients.yaml: change people redirect URI from /oidc/callback/ to
  /api/v1.0/callback/ (the actual path the Django app registers)
- people-values.yaml: set DJANGO_CONFIGURATION=Production so Django trusts
  X-Forwarded-Proto from Pingora and generates https:// URLs; add
  ALLOWED_HOSTS and DJANGO_CSRF_TRUSTED_ORIGINS for the people subdomain
2026-03-03 02:01:31 +00:00
419a45b3a7 fix: route people.* to frontend; path-route API/admin/oauth2 to backend
people.* now routes / to people-frontend (nginx/React SPA).
Path prefixes /api/, /admin/, and /o/ are forwarded to people-backend
(Django/gunicorn), matching the app's URL structure.

Previously all people.* traffic hit people-backend directly, causing
Django to return 404 "Page not found at /" for the root path.

The [[routes.paths]] mechanism already existed in the proxy (used by
the auth route) — only a config update was needed.
2026-03-03 01:04:10 +00:00
8621c0dd65 fix: correct Pingora upstream ports and kustomize namespace conflict
pingora-config.yaml: kratos-public and people-backend K8s Services
expose port 80, not 4433/8000. The wrong ports caused Pingora to
return timeouts for /kratos/* and all people.* routes.

ory/kustomization.yaml: remove kustomization-level namespace: ory
transformer. All non-Helm resources already declare namespace: ory
explicitly. The transformer was incorrectly moving hydra-maester's
enabledNamespaces Role (generated for the lasuite namespace) into ory,
producing a duplicate-name conflict during kustomize build.
2026-03-03 00:57:58 +00:00
3ecb42056f chore: replace sunbeam.py with cli package; fix VSO test RBAC
Remove scripts/sunbeam.py — superseded by the new cli/ package.
Add install/test/sunbeam targets to justfile pointing at ../cli/.

fix(vso): add deletecollection to test-rbac Role — CachingClientFactory
calls deletecollection on secrets during init; the old Role only had
delete, causing vault-secrets-operator-test to CrashLoopBackOff.

fix(ingress): pingora imagePullPolicy IfNotPresent — Always caused
unnecessary pulls on every pod restart in local dev.
2026-03-02 21:01:03 +00:00
e0f1803e33 docs(ingress): document disable_secure_redirection and other per-route options 2026-03-02 18:45:19 +00:00
7de6e94a8d fix: resource tuning — LiveKit Recreate strategy, OpenSearch JVM heap, login-ui
LiveKit: switch to Recreate deployment strategy. hostPorts (TURN UDP relay
range) block RollingUpdate because the new pod cannot schedule while the
old one still holds the ports.

OpenSearch: set OPENSEARCH_JAVA_OPTS to -Xms192m -Xmx256m. The upstream
default (-Xms512m -Xmx1g) immediately OOMs the container given our 512Mi
memory limit.

login-ui: raise memory limit from 64Mi to 192Mi and add a 64Mi request;
the previous limit was too tight and caused OOMKilled restarts under load.
2026-03-02 18:33:42 +00:00
3f516dc4d3 fix(ingress): fix People backend service name; add find route
The People backend service is named people-backend (not people) in the
desk chart. Add a route for find-backend to front the future OpenSearch
Dashboards service.
2026-03-02 18:33:34 +00:00
302b7ba56b feat(lasuite): add People service (desk chart); migrate La Suite secrets to VSO
People (desk chart v0.0.7):
- Add people-values.yaml with all env vars wired to ConfigMaps and Secrets.
  DB password, S3 credentials, OIDC client, and Django secret key all come
  from VSO-managed K8s Secrets via secretKeyRef — nothing hardcoded.
- Add Helm chart entry to kustomization.yaml (repo: suitenumerique/people).

La Suite VSO secrets (vault-secrets.yaml):
- seaweedfs-s3-credentials VSS (shared S3 creds → S3_ACCESS_KEY / S3_SECRET_KEY)
- hive-db-url VDS (database/static-creds/hive → postgresql:// DSN, 24h rotation)
- hive-oidc VSS (secret/hive → client-id / client-secret)
- people-db-credentials VDS (database/static-creds/people → password, 24h rotation)
- people-django-secret VSS (secret/people → DJANGO_SECRET_KEY)
2026-03-02 18:33:28 +00:00
8cb705fecc feat(devtools): migrate Gitea to OpenBao DB static role; sync admin creds via VSO
- gitea-db-credentials is now a VaultDynamicSecret reading from
  database/static-creds/gitea (OpenBao static role, 24h password rotation).
  Replaces the previous KV-based Secret that used a hardcoded localdev password.
- gitea-admin-credentials and gitea-s3-credentials remain VaultStaticSecrets
  synced from secret/gitea and secret/seaweedfs respectively.
- gitea-values.yaml adds gitea.admin.existingSecret so the chart reads the
  admin username/password from the VSO-managed Secret instead of values.
2026-03-02 18:33:16 +00:00
c7b812dde8 feat(ory): replace hardcoded DSN + secrets with OpenBao DB engine + VSO
All Ory service credentials now flow from OpenBao through VSO instead
of being hardcoded in Helm values or Deployment env vars.

Kratos:
- Remove config.dsn; flip secret.enabled=false with nameOverride pointing
  at kratos-app-secrets (a VSO-managed Secret with secretsDefault,
  secretsCookie, smtpConnectionURI).
- Inject DSN at runtime via deployment.extraEnv from kratos-db-creds
  (VaultDynamicSecret backed by OpenBao database static role, 24h rotation).

Hydra:
- Remove config.dsn; inject DSN via deployment.extraEnv from hydra-db-creds
  (VaultDynamicSecret, same rotation scheme).

Login UI:
- Replace hardcoded COOKIE_SECRET/CSRF_COOKIE_SECRET env var values with
  secretKeyRef reads from login-ui-secrets (VaultStaticSecret → secret/login-ui).

vault-secrets.yaml adds: VaultAuth, Hydra VSS, kratos-app-secrets VSS,
login-ui-secrets VSS, kratos-db-creds VDS, hydra-db-creds VDS.
2026-03-02 18:32:33 +00:00
580eb3983e feat(storage): migrate SeaweedFS S3 credentials to VSO; mount s3.json from Secret
Previously s3.json was embedded in the seaweedfs-filer-config ConfigMap
with hardcoded minioadmin credentials, and the config volume was mounted
at /etc/seaweedfs/ (overwriting filer.toml with its own directory mount).

- Remove s3.json from ConfigMap; fix the config volume to mount only
  filer.toml via subPath so both files coexist under /etc/seaweedfs/.
- Add vault-secrets.yaml with VaultStaticSecrets that VSO syncs from
  OpenBao secret/seaweedfs: seaweedfs-s3-credentials (S3_ACCESS_KEY /
  S3_SECRET_KEY) and seaweedfs-s3-json (s3.json as a JSON template).
- Mount seaweedfs-s3-json Secret at /etc/seaweedfs/s3.json via subPath.
2026-03-02 18:32:16 +00:00
361661e965 fix(data): remove empty data field from OpenBao placeholder Secret
kubectl apply --server-side was managing the `data: {}` field, which
caused it to wipe the key/root-token entries written by the seed script
on subsequent applies. Removing the field entirely means server-side
apply never touches data, so seed-written keys survive re-applies.
2026-03-02 18:32:02 +00:00
e3336ff2a9 feat(vso): deploy Vault Secrets Operator; add test RBAC + amd64 image aliases
- Add base/vso/ with Helm chart (v0.9.0 from helm.releases.hashicorp.com),
  namespace, and test-rbac.yaml granting the Helm test pod's default SA
  permission to create/read/delete Secrets, ConfigMaps, and Leases so the
  bundled connectivity test passes.
- Wire ../../base/vso into overlays/local/kustomization.yaml.
- Add image aliases for lasuite/people-backend and lasuite/people-frontend
  so kustomize rewrites those pulls to our Gitea registry (amd64-only images
  that are patched and mirrored by sunbeam.py).
2026-03-02 18:31:50 +00:00
5e36322a3b lasuite: declarative pre-work for La Suite app deployments
- Add find user and find_db to postgres-cluster.yaml (11th database)
- Add sunbeam-messages-imports and sunbeam-people buckets to SeaweedFS
- Configure Hydra Maester with enabledNamespaces: [lasuite] so it can
  create and update OAuth2Client secrets in the lasuite namespace
- Add find to Kratos allowed_return_urls
- Add shared ConfigMaps: lasuite-postgres, lasuite-valkey, lasuite-s3,
  lasuite-oidc-provider — single source of truth for all app env vars
- Add HydraOAuth2Client CRDs for all nine La Suite apps (docs, drive,
  meet, conversations, messages, people, find, gitea, hive); Maester
  will create oidc-<app> secrets with CLIENT_ID and CLIENT_SECRET
2026-03-01 18:03:13 +00:00
cdddc334ff feat: replace nginx placeholder with custom Pingora proxy; add Postfix MTA
Ingress:
- Deploy custom sunbeam-proxy (Pingora/Rust) replacing nginx placeholder
- HTTPS termination with mkcert (local) / rustls-acme (production)
- Host-prefix routing with path-based sub-routing for auth virtual host:
  /oauth2 + /.well-known + /userinfo → Hydra, /kratos → Kratos (prefix stripped), default → login-ui
- HTTP→HTTPS redirect, WebSocket passthrough, JSON audit logging, OTEL stub
- cert-manager HTTP-01 ACME challenge routing via Ingress watcher
- RBAC for Ingress watcher (pingora-watcher ClusterRole)
- local overlay: hostPorts 80/443, LiveKit TURN demoted to ClusterIP to avoid klipper conflict

Infrastructure:
- socket_vmnet shared network for host↔VM reachability (192.168.105.2)
- local-up.sh: cert-manager installation, eth1-based LIMA_IP detection, correct DOMAIN_SUFFIX sed substitution
- Postfix MTA in lasuite namespace: outbound relay via Scaleway TEM, accepts SMTP from cluster pods
- Kratos SMTP courier pointed at postfix.lasuite.svc.cluster.local:25
- Production overlay: cert-manager ClusterIssuer, ACME-enabled Pingora values
2026-03-01 16:25:11 +00:00
a589e6280d feat: bring up local dev stack — all services running
- Ory Hydra + Kratos: fixed secret management, DSN config, DB migrations,
  OAuth2Client CRD (helm template skips crds/ dir), login-ui env vars
- SeaweedFS: added s3.json credentials file via -s3.config CLI flag
- OpenBao: standalone mode with auto-unseal sidecar, keys in K8s secret
- OpenSearch: increased memory to 1.5Gi / JVM 1g heap
- Gitea: SSL_MODE disable, S3 bucket creation fixed
- Hive: automountServiceAccountToken: false (Lima virtiofs read-only rootfs quirk)
- LiveKit: API keys in values, hostPort conflict resolved
- Linkerd: native sidecar (proxy.nativeSidecar=true) to avoid blocking Jobs
- All placeholder images replaced: pingora→nginx:alpine, login-ui→oryd/kratos-selfservice-ui-node

Full stack running: postgres, valkey, openbao, opensearch, seaweedfs,
kratos, hydra, gitea, livekit, hive (placeholder), login-ui
2026-02-28 22:08:38 +00:00
92e80a761c fix(ory): re-enable hydra-maester, fix namespace, add memory limit 2026-02-28 14:02:47 +00:00
886c4221b2 fix(local): kustomize render passes cleanly
- Remove base/mesh from local overlay (Linkerd installed via CLI in local-up.sh)
- Fix LiveKit namespace: chart doesn't set .Release.Namespace, add explicit patches
- Fix release names: livekit-server and cloudnative-pg match chart names (avoid double-prefix)
- Disable hydra-maester (not needed for local dev)
- Add memory limits for cloudnative-pg operator and livekit-server deployments
- Remove non-functional values-ory.yaml patch (DOMAIN_SUFFIX handled by sed in local-up.sh)
- Gitignore **/charts/ (kustomize helm cache, generated artifact)
2026-02-28 14:00:31 +00:00
5d9bd7b067 chore: initial infrastructure scaffold
Kustomize base + overlays for the full Sunbeam k3s stack:
- base/mesh      — Linkerd edge (crds + control-plane + viz)
- base/ingress   — custom Pingora edge proxy
- base/ory       — Kratos 0.60.1 + Hydra 0.60.1 + login-ui
- base/data      — CloudNativePG 0.27.1, Valkey 8, OpenSearch 2
- base/storage   — SeaweedFS master + volume + filer (S3 on :8333)
- base/lasuite   — Hive sync daemon + La Suite app placeholders
- base/media     — LiveKit livekit-server 1.9.0
- base/devtools  — Gitea 12.5.0 (external PG + Valkey)
overlays/local   — sslip.io domain, mkcert TLS, Lima hostPort
overlays/production — stub (TODOs for sunbeam.pt values)
scripts/         — local-up/down/certs/urls helpers
justfile         — up / down / certs / urls targets
2026-02-28 13:42:27 +00:00