Commit Graph

62 Commits

Author SHA1 Message Date
fdcc15080f fix(matrix): use https:// for livekit_url in well-known
Element Call expects livekit_service_url to be an HTTPS endpoint
(lk-jwt-service), not a WebSocket URL. The client connects to LiveKit
via WSS separately after getting a JWT.
2026-03-25 13:24:12 +00:00
84c5548f2e feat(ingress): route lk-jwt-service paths + bare domain well-known
Split livekit.* requests: /sfu/get, /healthz, /get_token → lk-jwt-service,
everything else → livekit-server (WebSocket). Add sunbeam.pt bare domain
route so Element X can discover RTC foci from the server_name.
2026-03-25 13:23:59 +00:00
4837983380 feat(media): deploy lk-jwt-service for Matrix Element Call
Bridges Element Call to LiveKit by exchanging Matrix OpenID tokens for
LiveKit JWTs. Shares API credentials with livekit-server via the
existing VSO secret (removed excludeRaw so raw fields are available).
2026-03-25 13:23:48 +00:00
50a4abf94f fix(ory): harden Kratos and Hydra production security configuration
Kratos: xchacha20-poly1305 cipher for at-rest encryption, 12-char min
password with HaveIBeenPwned + similarity check, recovery/verification
switched to code (not link), anti-enumeration on unknown recipients,
15m privileged session, 24h session extend throttle, JSON structured
logging, WebAuthn passwordless enabled, additionalProperties: false on
all identity schemas, memory limits bumped to 256Mi.

Hydra: expose_internal_errors disabled, PKCE enforced for public
clients, janitor CronJob every 6h, cookie domain set explicitly,
SSRF prevention via disallow_private_ip_ranges, JSON structured
logging, Maester enabledNamespaces includes monitoring.

Also: fixed selfservice URL patch divergence (settings path, missing
allowed_return_urls), removed invalid responseTypes on Hive client.
2026-03-24 19:40:58 +00:00
4c02fe18ed fix: use Kratos session auth for observability endpoints
Observability routes (systemmetrics, systemlogs, systemtracing) use
Kratos /sessions/whoami for auth_request — validates browser session
cookies scoped to the parent domain. Admin API routes (id, hydra,
search, vault) keep Hydra /userinfo for Bearer token auth (CLI access).
2026-03-24 13:58:34 +00:00
0498d1c6b3 fix: gate systemmetrics/systemlogs/systemtracing behind OIDC auth
Prometheus, Loki, and Tempo external endpoints were publicly accessible
with no authentication. Add auth_request to all three routes using
Hydra's userinfo endpoint (same pattern as admin APIs).
2026-03-24 13:48:27 +00:00
1147b1a5aa fix: WOPI registration on restart + Collabora readiness probes
- Add readiness/liveness probes to Collabora (GET /hosting/discovery)
- Add init container to Drive backend that waits for Collabora and runs
  trigger_wopi_configuration on every pod start — fixes WOPI silently
  breaking after server restarts (chart Job only ran on sunbeam apply)
- Add OIDC_RESPONSE_MODE=query to Projects config
2026-03-24 12:22:10 +00:00
5e622ce316 feat: AlertManager Matrix integration with severity routing
Deploy matrix-alertmanager-receiver bridge (pending bot credentials in
OpenBao). Update AlertManager routing: critical → Matrix + email,
warning → Matrix only, Watchdog → null. Reduce repeat interval to 4h.
2026-03-24 12:21:29 +00:00
e8c64e6f18 feat: add ServiceMonitors and enable metrics scraping
- SeaweedFS: enable -metricsPort=9091 on master/volume/filer, add
  service labels, create ServiceMonitor
- Gitea: enable metrics in config, create ServiceMonitor
- Hydra/Kratos: standalone ServiceMonitors (chart templates require
  .Capabilities.APIVersions unavailable in kustomize helm template)
- LiveKit: add prometheus_port=6789, standalone ServiceMonitor
  (disabled in kustomization — host firewall blocks port 6789)
- OpenSearch: revert prometheus-exporter attempt (no plugin for v3.x),
  add service label for future exporter sidecar
2026-03-24 12:21:18 +00:00
3fc54c8851 feat: add PrometheusRule alerts for all services
28 alert rules across 9 PrometheusRule files covering infrastructure
(Longhorn, cert-manager), data (PostgreSQL, OpenBao, OpenSearch),
storage (SeaweedFS), devtools (Gitea), identity (Hydra, Kratos),
media (LiveKit), and mesh (Linkerd golden signals for all services).

Severity routing: critical alerts fire to Matrix + email, warnings
to Matrix only (AlertManager config updated in separate commit).
2026-03-24 12:20:55 +00:00
74bb59cfdc feat: split Grafana dashboards into per-folder ConfigMaps
Replace monolithic dashboards-configmap.yaml with 10 dedicated files,
one per Grafana folder: Ingress, Observability, Infrastructure, Storage,
Identity, DevTools, Search, Media, La Suite, Communications.

New dashboards for Longhorn, PostgreSQL/CNPG, Cert-Manager, SeaweedFS,
Hydra, Kratos, Gitea, OpenSearch, LiveKit, La Suite golden signals
(Linkerd metrics), Matrix, and Email Pipeline.
2026-03-24 12:20:42 +00:00
dc95e1d8ec sol v1.1.0: SearXNG web search, evaluator redesign, research agents
- SearXNG deployment in data namespace (free, no-tracking web search)
- sol-config: SearXNG URL, research config, identity agent, updated
  system prompt (DM search rules, research mode, silence, hard rules)
- sol-deployment: debug logging (RUST_LOG=sol=debug), full image path
- opensearch: tolerate missing prometheus-exporter plugin on OS 3
2026-03-23 09:54:56 +00:00
d7ff1da729 sol: identity agent, research mode, evaluator redesign, DM search
sol-config.yaml:
- added [services.kratos] with admin URL
- added research config (model, max_iterations, max_agents, max_depth)
- tool iterations bumped to 250
- updated system prompt: research mode guidance, DM search rules,
  run_script docs, room overlap explanation, silence mechanic
- time context uses {time_block} with midnight-based boundaries
- evaluator returns response_type (message/thread/react/ignore)
2026-03-23 08:47:40 +00:00
a086049de6 fix: harden SeaweedFS storage and fix Drive presigned uploads
- SeaweedFS filer: Recreate strategy (prevents LevelDB lock contention),
  60s termination grace period, memory 256Mi→2Gi limit
- SeaweedFS volume: 60s termination grace period, memory 256Mi→1Gi limit
- Drive: add AWS_S3_DOMAIN_REPLACE so presigned upload URLs use
  s3.sunbeam.pt instead of internal cluster DNS
- Drive: relax liveness/readiness probes (failureThreshold 1→3,
  period 1s→10s, timeout 1s→5s) to prevent crash loops under load
2026-03-22 19:48:36 +00:00
9af3cd3c49 feat: expose admin APIs behind OIDC auth_request
Adds pingora routes for id, hydra, search, vault subdomains.
Each gated by auth_request to Hydra userinfo — only valid SSO
bearer tokens pass through. Adds new SANs to the TLS certificate.
2026-03-22 18:59:22 +00:00
fb91fcd284 sol: vault auth, gitea integration, search fixes
sol-config: added [vault] and [services.gitea] sections, fetch
allowlist (wttr.in, open-meteo, github), bumped context windows
to 200, updated system prompt with run_script docs and tool rules.

sol-deployment: added gitea admin credential env vars from
sol-secrets, automountServiceAccountToken for vault k8s auth.

vault-secrets: added gitea-admin-username and gitea-admin-password
templates to sol-secrets VSS.
2026-03-22 15:16:22 +00:00
e1e6a6bc31 update sol configmap: multi-agent architecture + conversations API
- Add db_path (/data/sol.db) for SQLite persistence
- Add memory_index, script_*, memory_extraction_enabled fields
- Add [agents] section: orchestrator model, compaction threshold, conversations API enabled
- Rewrite system prompt (687 → 150 lines): dense, few-shot, hard rules
- Add {room_context_rules} placeholder for group vs DM behavior
2026-03-21 22:25:54 +00:00
d3943c9a84 feat(monitoring): wire up full LGTM observability stack
- Prometheus: discover ServiceMonitors/PodMonitors in all namespaces,
  enable remote write receiver for Tempo metrics generator
- Tempo: enable metrics generator (service-graphs + span-metrics)
  with remote write to Prometheus
- Loki: add Grafana Alloy DaemonSet to ship container logs
- Grafana: enable dashboard sidecar, add Pingora/Loki/Tempo/OpenBao
  dashboards, add stable UIDs and cross-linking between datasources
  (Loki↔Tempo derived fields, traces→logs, traces→metrics, service map)
- Linkerd: enable proxy tracing to Alloy OTLP collector, point
  linkerd-viz at existing Prometheus instead of deploying its own
- Pingora: add OTLP rollout plan (endpoint commented out until proxy
  telemetry panic fix is deployed and Alloy is verified healthy)
2026-03-21 17:36:54 +00:00
5f923d14f9 feat(matrix): add Sol virtual librarian deployment manifests
Sol is a Matrix bot with E2EE that archives conversations to OpenSearch
and responds via Mistral AI function calling. Adds deployment, PVC,
ConfigMap (sol.toml + system prompt), VaultStaticSecret for credentials,
and production overlay image entry.
2026-03-20 21:38:48 +00:00
bfe0280732 feat(lasuite): add Projects (Planka Kanban) service
Deploy Planka-based project management at projects.DOMAIN_SUFFIX:
- ConfigMap with OIDC, S3, SMTP, La Gaufre widget config
- Deployment + Service (init container for DB migrations, Sails on 1337)
- OAuth2Client (client_secret_basic, redirect to /oidc-callback)
- VaultDynamicSecret for DATABASE_URL, VaultStaticSecret for SECRET_KEY
- Pingora route with websocket support (Socket.io)
- Image overrides in both local and production overlays
- TLS cert dnsNames updated for projects subdomain
- Integration service.json updated with Projects entry
- seaweedfs-s3-credentials rolloutRestartTargets includes projects
2026-03-20 13:41:54 +00:00
b9d9ad72fe fix(ory): enable MFA methods, fix font URL, clean up login-ui
Enable TOTP, WebAuthn, and lookup secret MFA methods in Kratos config.
Fix Monaspace Neon font CDN URL in Gitea theme ConfigMap. Remove
redundant Google Fonts preconnect from people-frontend nginx config.
Delete unused login-ui-deployment.yaml (login-ui is part of the Ory
Helm chart, not a standalone deployment).
2026-03-18 18:36:15 +00:00
3c7460f4a6 feat(lasuite): add calendars service deployment manifests
Add K8s manifests for calendars backend, frontend (Caddy), CalDAV
server, and Celery worker. Wire Pingora routing for cal.sunbeam.pt
with path-based backend/caldav/static splits. Add OAuth2Client for
OIDC, VaultDynamicSecret for DB credentials, VaultStaticSecret for
Django/CalDAV keys, and TLS cert coverage for the cal subdomain.
Register calendars in the integration service gaufre widget.
2026-03-18 18:36:05 +00:00
ccfe8b877a feat: La Suite email/messages, buildkitd, monitoring, vault and storage updates
- Add Messages (email) service: backend, frontend, MTA in/out, MPA, SOCKS
  proxy, worker, DKIM config, and theme customization
- Add Collabora deployment for document collaboration
- Add Drive frontend nginx config and values
- Add buildkitd namespace for in-cluster container builds
- Add SeaweedFS remote sync and additional S3 buckets
- Update vault secrets across namespaces (devtools, lasuite, media,
  monitoring, ory, storage) with expanded credential management
- Update monitoring: rename grafana→metrics OAuth2Client, add Prometheus
  remote write and additional scrape configs
- Update local/production overlays with resource patches
- Remove stale login-ui resource patch from production overlay
2026-03-10 19:00:57 +00:00
e5741c4df6 feat: integrate tuwunel with Ory SSO, rename chat to messages subdomain
- Add matrix to hydra-maester enabledNamespaces for OAuth2Client CRD
- Update allowed_return_urls and selfservice URLs: chat→messages
- Add Kratos verification flow, employee/external identity schemas
- Extend session lifespan to 30 days with persistent cookies
- Route messages.* to tuwunel via Pingora with WebSocket support
- Replace login-ui with kratos-admin-ui as unified auth frontend
- Update TLS certificate SANs: chat→messages, add monitoring subdomains
- Add tuwunel + La Suite images to production overlay
- Switch DDoS/scanner detection to compiled-in ensemble models (observe_only)
2026-03-10 18:52:47 +00:00
584e98316b feat(data): upgrade OpenSearch to v3 with ML Commons for neural search
- Upgrade from OpenSearch 2 to 3 (required for ML Commons pre-trained models)
- Rename PLUGINS_SECURITY_DISABLED → DISABLE_SECURITY_PLUGIN (OS3 change)
- Enable ML Commons plugin settings for on-data-node inference
- Increase memory limits (2Gi) and JVM heap for neural model inference
- Add fsGroup security context for volume permissions
2026-03-10 18:52:29 +00:00
d2148335de feat(matrix): add tuwunel Matrix homeserver deployment manifests
Kubernetes manifests for tuwunel — a Rust Matrix homeserver using RocksDB
for storage. Includes deployment, service, PVC, ConfigMap (tuwunel.toml),
Hydra OAuth2Client for SSO, and Vault secrets for credentials injection.

Key design decisions:
- enableServiceLinks: false to prevent K8s TUWUNEL_* env var conflicts
- strategy: Recreate for RocksDB exclusive lock (no rolling updates)
- Identity provider configured entirely via env vars (client_id/secret
  from hydra-maester Secret, not hardcoded)
- OpenSearch model_id injected via ConfigMap from CLI post-apply hook
- SSO-only auth (login_with_password=false, single_sso=true)
- OpenSearch hybrid neural+BM25 search (768-dim, all-mpnet-base-v2)
2026-03-10 18:52:21 +00:00
91983ddf29 feat(observability): enable OTLP tracing, fix Prometheus scraping, add proxy ServiceMonitor
- Set otlp_endpoint to Tempo HTTP receiver (port 4318) for request tracing
- Add hostNetwork to prometheusSpec so it can reach kubelet/node-exporter on node public IP
- Add ServiceMonitor for proxy metrics scrape on port 9090
- Add CORS origin and Grafana datasource config for monitoring stack
2026-03-09 08:20:42 +00:00
caefb071a8 fix(ingress): use 10.0.0.0/8 bypass for all cluster-internal traffic
Pod IPs are in 10.0.0.0/24, not 10.42.0.0/16 as assumed. Broadening
to 10.0.0.0/8 covers pods, services, and CNI overlays.
2026-03-09 08:00:46 +00:00
a101ea4b06 fix(ingress): add localhost to rate-limit bypass CIDRs
Adds 127.0.0.0/8 and ::1/128 so host-networked pods (buildkitd) are
not blocked by the detection pipeline.
2026-03-09 01:40:25 +00:00
7c1676d2b9 feat(ingress): add detection pipeline config and metrics port
- Add DDoS, scanner, and rate limiter configuration to pingora-config
- Add kubernetes config section with configurable namespace/resource names
- Expose metrics port 9090 on deployment and service
2026-03-08 20:37:49 +00:00
f3faf31d4b Fix meet: ALLOWED_HOSTS, OIDC callback, and LiveKit connectivity
- meet-config: rename ALLOWED_HOSTS → DJANGO_ALLOWED_HOSTS (django-configurations
  ListValue uses DJANGO_ prefix by default; without it the list was empty and
  every browser request got 400 DisallowedHost)
- meet-config: set LIVEKIT_API_URL to public https://livekit.DOMAIN_SUFFIX so
  the meet frontend can reach LiveKit for WebSocket signaling
- pingora-config: add livekit.DOMAIN_SUFFIX → livekit-server:80 WebSocket route
- cert-manager: add livekit.DOMAIN_SUFFIX to TLS cert dnsNames
- oidc-clients: fix meet redirect URI /oidc/callback/ → /api/v1.0/callback/
  (meet embeds mozilla-django-oidc inside the api/v1.0/ prefix); add
  postLogoutRedirectUri for clean logout
- livekit-values: replace hardcoded devkey:secret-placeholder with key_file
  loaded from a VSO-managed K8s Secret (secret/livekit in OpenBao)
- media/vault-secrets: add VaultAuth + VaultStaticSecret for media namespace
  to sync livekit API credentials from OpenBao
2026-03-06 13:56:29 +00:00
424db43ccf feat(infra): Meet integration, La Suite theming, Pingora SSH + meet routes
Meet: add backend/frontend/celery deployments and services, meet-config
ConfigMap, nginx SPA config, VSO secrets (meet-db-credentials VDS,
meet-django-secret and meet-livekit VSS). Wire oidc-meet OAuth2Client.

La Suite overlay discipline: move people/docs frontend nginx ConfigMaps
and patches from overlays/local to base so both environments share them.
Remove values-ory.yaml (folded into base). Add docs-frontend nginx config
with sub_filter theming. Add local gitea mkcert CA patch.

Pingora: add [ssh] TCP passthrough block (port 22 → Gitea SSH pod) and
split meet route into frontend default + backend paths for /api/, /admin/,
/oidc/, /static/, /__. Remove now-unused values-pingora.yaml from production
overlay (host ports moved to patch-pingora-hostport.yaml).

Update both overlay kustomizations to reference all new resources and
add meet-backend/meet-frontend image entries.
2026-03-06 12:08:21 +00:00
d32d1435f9 feat(infra): data, storage, devtools, and ory layer updates
- data: CNPG cluster tuning, OpenBao values, OpenSearch deployment fixes,
  OpenSearch PVC, barman vault secret for S3 backup credentials
- storage: SeaweedFS filer updates (s3.json via secret subPath), PVC for
  filer persistent storage
- devtools: Gitea values (SSH service, custom theme), gitea-theme-cm ConfigMap
- ory: add kratos-selfservice-urls.yaml for self-service flow URLs
- media: LiveKit values updated (TURN config, STUN, resource limits)
- vso: kustomization cleanup
2026-03-06 12:07:28 +00:00
7ff35d3e0c feat(infra): production bootstrap — cert-manager, longhorn, monitoring
Add new bases for cert-manager (Let's Encrypt + wildcard cert), Longhorn
distributed storage, and monitoring (kube-prometheus-stack + Loki + Tempo
+ Grafana OIDC). Add cloud-init for Scaleway Elastic Metal provisioning.

Production overlay: add patches for postgres sizing, SeaweedFS volume,
OpenSearch storage, LiveKit service, Pingora host ports, resource limits,
and CNPG daily barman backups. Update cert-manager.yaml with full dnsNames
for all *.sunbeam.pt subdomains.
2026-03-06 12:06:27 +00:00
f7774558e9 fix(lasuite): override impress createsuperuser job with no-op command
The impress chart renders this Job unconditionally (no if-enabled guard),
then auto-deletes it after 30s (ttlSecondsAfterFinished). Each sunbeam
apply recreated it and it failed because no superuser credentials are set
(users authenticate via OIDC). Override the command to true so the Job
exits 0 immediately and disappears cleanly.
2026-03-03 18:12:38 +00:00
7ffddcafcd fix(ory,lasuite): harden session security and fix logout + WebSocket routing
- Fix Hydra postLogoutRedirectUris for docs and people to match the
  actual URI sent by mozilla_django_oidc v5 (/api/v1.0/logout-callback/)
  instead of the root URL, resolving 599 logout errors.

- Fix docs y-provider WebSocket backend port: use Service port 443
  (not pod port 4444 which has no DNAT rule) in Pingora config.

- Tighten VSO VaultDynamicSecret rotation sync: add allowStaticCreds:true
  and reduce refreshAfter from 1h to 5m across all static-creds paths
  (kratos, hydra, gitea, hive, people, docs) so credential rotation is
  reflected within 5 minutes instead of up to 1 hour.

- Set Hydra token TTLs: access_token and id_token to 5m; refresh_token
  to 720h (30 days). Kratos session carries silent re-auth so the short
  access token TTL does not require users to log in manually.

- Set SESSION_COOKIE_AGE=3600 (1h) in docs and people backends. After
  1h, apps silently re-auth via the active Kratos session. Disabled
  identities (sunbeam user disable) cannot re-auth on next expiry.
2026-03-03 18:07:08 +00:00
897013bcb7 feat(lasuite): migrate integration service to La Gaufre v2
Replace the inline gaufre.js/nginx.conf ConfigMap approach with a
purpose-built custom image (sunbeam/integration-service) that builds
the lagaufre.js v2 widget from the suitenumerique/integration source
and serves it via nginx.

Changes:
- Rewrite integration-deployment.yaml: custom image, v2 services.json
  format, only actually-deployed services (docs, meet, people)
- Add people-frontend nginx sub_filter overlay to rewrite the hardcoded
  production integration URL baked into the Next.js bundle at build time
- Register integration image in local overlay kustomization
2026-03-03 16:08:48 +00:00
8113e504ba fix(lasuite): use internal cluster URLs for OIDC backend endpoints
Django backends call the OIDC token, userinfo, and JWKS endpoints
server-side. Pointing these at the public auth.DOMAIN_SUFFIX URL caused
an SSLError in pods because mkcert CA certificates are not trusted inside
containers.

Split the configmap entries:
- OIDC_OP_AUTHORIZATION_ENDPOINT and OIDC_OP_LOGOUT_ENDPOINT remain as
  public HTTPS URLs -- the browser navigates to these.
- OIDC_OP_TOKEN_ENDPOINT, OIDC_OP_USER_ENDPOINT, OIDC_OP_JWKS_ENDPOINT
  now point to http://hydra-public.ory.svc.cluster.local:4444 -- Django
  calls these directly, bypassing the proxy and its TLS certificate.

Affects all La Suite apps (docs, people) that use lasuite-oidc-provider.
2026-03-03 14:31:21 +00:00
2e89854f86 feat(lasuite): deploy La Suite Docs (impress)
Adds the impress Helm chart (suitenumerique/docs, v4.5.0) to the lasuite
namespace with full Pingora routing, VSO secrets, and local overlay
resource tuning.

Routing (pingora-config.yaml):
- docs.* frontend -> docs-frontend:80 (nginx, static Next.js export)
- /api/* and /admin/* -> docs-backend:80 (Django/uvicorn)
- /collaboration/ws/* -> docs-y-provider:4444 (Hocuspocus WebSocket)
- integration.* -> integration:80 (La Gaufre hub, same file)

Secrets (vault-secrets.yaml):
- VaultDynamicSecret docs-db-credentials (DB engine, static role)
- VaultStaticSecret docs-django-secret (DJANGO_SECRET_KEY)
- VaultStaticSecret docs-collaboration-secret (y-provider shared secret)

OIDC client (oidc-clients.yaml):
- Fix redirect_uri from /oidc/callback/ to /api/v1.0/callback/ -- impress
  mounts all OIDC URLs under api/{API_VERSION}/ via lasuite.oidc_login,
  same pattern as people.

Local overlay (values-resources.yaml):
- docs-backend: 512Mi limit, WEB_CONCURRENCY=2 (4 uvicorn workers
  exceeded 384Mi at startup on the arm64 Lima VM)
- docs-celery-worker: 384Mi limit, CELERY_WORKER_CONCURRENCY=2
- docs-y-provider: 256Mi limit
- seaweedfs-filer: raised from 256Mi to 512Mi (OOMKilled during 188MB
  multipart S3 upload of impress-y-provider image layer)

Local overlay (kustomization.yaml):
- Image mirrors for impress-backend, impress-frontend, impress-y-provider
  (amd64-only images retagged to Gitea via cmd_mirror before deploy)
2026-03-03 14:30:45 +00:00
a2f55f38f0 feat(lasuite): add La Gaufre integration service
Deploys the suitenumerique/lasuite-integration app that serves the La
Gaufre app launcher (gaufre.js) and acts as the federation hub for the
La Suite Numérique app switching menu.

The service runs at integration.DOMAIN_SUFFIX and exposes
/api/v1/gaufre.js — referenced by docs, people, and other La Suite
apps via GAUFREJS_URL to render the unified app switcher.
2026-03-03 14:28:23 +00:00
f13beed1c4 fix(lasuite): fix OIDC config for People login
- Switch all user-facing app OAuth2 clients to client_secret_post
  (mozilla-django-oidc sends credentials in POST body by default)
- Set LOGIN_REDIRECT_URL=/ so Django redirects to frontend after login
- Add local overlay patch to disable OIDC SSL verification
  (mkcert CA not trusted inside pods; production uses real certs)
2026-03-03 11:31:28 +00:00
b19e553f54 fix(ory): configure Kratos oauth2 provider, session cookie domain, and flows
- Add oauth2_provider.url pointing to hydra-admin so login_challenge
  params are accepted (fixes People OIDC login flow)
- Scope session cookie to parent DOMAIN_SUFFIX so admin.* subdomains
  share the session (fixes redirect loop on kratos-admin-ui)
- Add allowed_return_urls for admin.*, enable recovery flow, add error
  and recovery ui_url entries
- Fix KRATOS_PUBLIC_URL port in login-ui deployment (4433 → 80)
2026-03-03 11:31:00 +00:00
6cc60c66ff feat(ory): add kratos-admin-ui service
Deploy the custom Kratos admin UI (Deno/Hono + Cunningham React):
- K8s Deployment + Service in ory namespace
- VSO VaultStaticSecret for cookie/csrf/admin-identity-ids secrets
- Pingora route for admin.DOMAIN_SUFFIX
2026-03-03 11:30:52 +00:00
9092e2711b fix(lasuite): configure people for Production Django settings and correct OIDC redirect URI
- oidc-clients.yaml: change people redirect URI from /oidc/callback/ to
  /api/v1.0/callback/ (the actual path the Django app registers)
- people-values.yaml: set DJANGO_CONFIGURATION=Production so Django trusts
  X-Forwarded-Proto from Pingora and generates https:// URLs; add
  ALLOWED_HOSTS and DJANGO_CSRF_TRUSTED_ORIGINS for the people subdomain
2026-03-03 02:01:31 +00:00
419a45b3a7 fix: route people.* to frontend; path-route API/admin/oauth2 to backend
people.* now routes / to people-frontend (nginx/React SPA).
Path prefixes /api/, /admin/, and /o/ are forwarded to people-backend
(Django/gunicorn), matching the app's URL structure.

Previously all people.* traffic hit people-backend directly, causing
Django to return 404 "Page not found at /" for the root path.

The [[routes.paths]] mechanism already existed in the proxy (used by
the auth route) — only a config update was needed.
2026-03-03 01:04:10 +00:00
8621c0dd65 fix: correct Pingora upstream ports and kustomize namespace conflict
pingora-config.yaml: kratos-public and people-backend K8s Services
expose port 80, not 4433/8000. The wrong ports caused Pingora to
return timeouts for /kratos/* and all people.* routes.

ory/kustomization.yaml: remove kustomization-level namespace: ory
transformer. All non-Helm resources already declare namespace: ory
explicitly. The transformer was incorrectly moving hydra-maester's
enabledNamespaces Role (generated for the lasuite namespace) into ory,
producing a duplicate-name conflict during kustomize build.
2026-03-03 00:57:58 +00:00
3ecb42056f chore: replace sunbeam.py with cli package; fix VSO test RBAC
Remove scripts/sunbeam.py — superseded by the new cli/ package.
Add install/test/sunbeam targets to justfile pointing at ../cli/.

fix(vso): add deletecollection to test-rbac Role — CachingClientFactory
calls deletecollection on secrets during init; the old Role only had
delete, causing vault-secrets-operator-test to CrashLoopBackOff.

fix(ingress): pingora imagePullPolicy IfNotPresent — Always caused
unnecessary pulls on every pod restart in local dev.
2026-03-02 21:01:03 +00:00
e0f1803e33 docs(ingress): document disable_secure_redirection and other per-route options 2026-03-02 18:45:19 +00:00
7de6e94a8d fix: resource tuning — LiveKit Recreate strategy, OpenSearch JVM heap, login-ui
LiveKit: switch to Recreate deployment strategy. hostPorts (TURN UDP relay
range) block RollingUpdate because the new pod cannot schedule while the
old one still holds the ports.

OpenSearch: set OPENSEARCH_JAVA_OPTS to -Xms192m -Xmx256m. The upstream
default (-Xms512m -Xmx1g) immediately OOMs the container given our 512Mi
memory limit.

login-ui: raise memory limit from 64Mi to 192Mi and add a 64Mi request;
the previous limit was too tight and caused OOMKilled restarts under load.
2026-03-02 18:33:42 +00:00
3f516dc4d3 fix(ingress): fix People backend service name; add find route
The People backend service is named people-backend (not people) in the
desk chart. Add a route for find-backend to front the future OpenSearch
Dashboards service.
2026-03-02 18:33:34 +00:00