19 Commits

Author SHA1 Message Date
a912331f97 feat: CNPG PodMonitor, OpenBao ServiceMonitor, CLI OIDC client CRD
- CNPG PodMonitor for PostgreSQL cluster metrics
- OpenBao ServiceMonitor for vault metrics scraping
- Sunbeam CLI OAuth2Client CRD (moved from seed to declarative)
2026-03-25 18:01:52 +00:00
50a4abf94f fix(ory): harden Kratos and Hydra production security configuration
Kratos: xchacha20-poly1305 cipher for at-rest encryption, 12-char min
password with HaveIBeenPwned + similarity check, recovery/verification
switched to code (not link), anti-enumeration on unknown recipients,
15m privileged session, 24h session extend throttle, JSON structured
logging, WebAuthn passwordless enabled, additionalProperties: false on
all identity schemas, memory limits bumped to 256Mi.

Hydra: expose_internal_errors disabled, PKCE enforced for public
clients, janitor CronJob every 6h, cookie domain set explicitly,
SSRF prevention via disallow_private_ip_ranges, JSON structured
logging, Maester enabledNamespaces includes monitoring.

Also: fixed selfservice URL patch divergence (settings path, missing
allowed_return_urls), removed invalid responseTypes on Hive client.
2026-03-24 19:40:58 +00:00
e8c64e6f18 feat: add ServiceMonitors and enable metrics scraping
- SeaweedFS: enable -metricsPort=9091 on master/volume/filer, add
  service labels, create ServiceMonitor
- Gitea: enable metrics in config, create ServiceMonitor
- Hydra/Kratos: standalone ServiceMonitors (chart templates require
  .Capabilities.APIVersions unavailable in kustomize helm template)
- LiveKit: add prometheus_port=6789, standalone ServiceMonitor
  (disabled in kustomization — host firewall blocks port 6789)
- OpenSearch: revert prometheus-exporter attempt (no plugin for v3.x),
  add service label for future exporter sidecar
2026-03-24 12:21:18 +00:00
3fc54c8851 feat: add PrometheusRule alerts for all services
28 alert rules across 9 PrometheusRule files covering infrastructure
(Longhorn, cert-manager), data (PostgreSQL, OpenBao, OpenSearch),
storage (SeaweedFS), devtools (Gitea), identity (Hydra, Kratos),
media (LiveKit), and mesh (Linkerd golden signals for all services).

Severity routing: critical alerts fire to Matrix + email, warnings
to Matrix only (AlertManager config updated in separate commit).
2026-03-24 12:20:55 +00:00
b9d9ad72fe fix(ory): enable MFA methods, fix font URL, clean up login-ui
Enable TOTP, WebAuthn, and lookup secret MFA methods in Kratos config.
Fix Monaspace Neon font CDN URL in Gitea theme ConfigMap. Remove
redundant Google Fonts preconnect from people-frontend nginx config.
Delete unused login-ui-deployment.yaml (login-ui is part of the Ory
Helm chart, not a standalone deployment).
2026-03-18 18:36:15 +00:00
ccfe8b877a feat: La Suite email/messages, buildkitd, monitoring, vault and storage updates
- Add Messages (email) service: backend, frontend, MTA in/out, MPA, SOCKS
  proxy, worker, DKIM config, and theme customization
- Add Collabora deployment for document collaboration
- Add Drive frontend nginx config and values
- Add buildkitd namespace for in-cluster container builds
- Add SeaweedFS remote sync and additional S3 buckets
- Update vault secrets across namespaces (devtools, lasuite, media,
  monitoring, ory, storage) with expanded credential management
- Update monitoring: rename grafana→metrics OAuth2Client, add Prometheus
  remote write and additional scrape configs
- Update local/production overlays with resource patches
- Remove stale login-ui resource patch from production overlay
2026-03-10 19:00:57 +00:00
e5741c4df6 feat: integrate tuwunel with Ory SSO, rename chat to messages subdomain
- Add matrix to hydra-maester enabledNamespaces for OAuth2Client CRD
- Update allowed_return_urls and selfservice URLs: chat→messages
- Add Kratos verification flow, employee/external identity schemas
- Extend session lifespan to 30 days with persistent cookies
- Route messages.* to tuwunel via Pingora with WebSocket support
- Replace login-ui with kratos-admin-ui as unified auth frontend
- Update TLS certificate SANs: chat→messages, add monitoring subdomains
- Add tuwunel + La Suite images to production overlay
- Switch DDoS/scanner detection to compiled-in ensemble models (observe_only)
2026-03-10 18:52:47 +00:00
d32d1435f9 feat(infra): data, storage, devtools, and ory layer updates
- data: CNPG cluster tuning, OpenBao values, OpenSearch deployment fixes,
  OpenSearch PVC, barman vault secret for S3 backup credentials
- storage: SeaweedFS filer updates (s3.json via secret subPath), PVC for
  filer persistent storage
- devtools: Gitea values (SSH service, custom theme), gitea-theme-cm ConfigMap
- ory: add kratos-selfservice-urls.yaml for self-service flow URLs
- media: LiveKit values updated (TURN config, STUN, resource limits)
- vso: kustomization cleanup
2026-03-06 12:07:28 +00:00
7ffddcafcd fix(ory,lasuite): harden session security and fix logout + WebSocket routing
- Fix Hydra postLogoutRedirectUris for docs and people to match the
  actual URI sent by mozilla_django_oidc v5 (/api/v1.0/logout-callback/)
  instead of the root URL, resolving 599 logout errors.

- Fix docs y-provider WebSocket backend port: use Service port 443
  (not pod port 4444 which has no DNAT rule) in Pingora config.

- Tighten VSO VaultDynamicSecret rotation sync: add allowStaticCreds:true
  and reduce refreshAfter from 1h to 5m across all static-creds paths
  (kratos, hydra, gitea, hive, people, docs) so credential rotation is
  reflected within 5 minutes instead of up to 1 hour.

- Set Hydra token TTLs: access_token and id_token to 5m; refresh_token
  to 720h (30 days). Kratos session carries silent re-auth so the short
  access token TTL does not require users to log in manually.

- Set SESSION_COOKIE_AGE=3600 (1h) in docs and people backends. After
  1h, apps silently re-auth via the active Kratos session. Disabled
  identities (sunbeam user disable) cannot re-auth on next expiry.
2026-03-03 18:07:08 +00:00
b19e553f54 fix(ory): configure Kratos oauth2 provider, session cookie domain, and flows
- Add oauth2_provider.url pointing to hydra-admin so login_challenge
  params are accepted (fixes People OIDC login flow)
- Scope session cookie to parent DOMAIN_SUFFIX so admin.* subdomains
  share the session (fixes redirect loop on kratos-admin-ui)
- Add allowed_return_urls for admin.*, enable recovery flow, add error
  and recovery ui_url entries
- Fix KRATOS_PUBLIC_URL port in login-ui deployment (4433 → 80)
2026-03-03 11:31:00 +00:00
6cc60c66ff feat(ory): add kratos-admin-ui service
Deploy the custom Kratos admin UI (Deno/Hono + Cunningham React):
- K8s Deployment + Service in ory namespace
- VSO VaultStaticSecret for cookie/csrf/admin-identity-ids secrets
- Pingora route for admin.DOMAIN_SUFFIX
2026-03-03 11:30:52 +00:00
8621c0dd65 fix: correct Pingora upstream ports and kustomize namespace conflict
pingora-config.yaml: kratos-public and people-backend K8s Services
expose port 80, not 4433/8000. The wrong ports caused Pingora to
return timeouts for /kratos/* and all people.* routes.

ory/kustomization.yaml: remove kustomization-level namespace: ory
transformer. All non-Helm resources already declare namespace: ory
explicitly. The transformer was incorrectly moving hydra-maester's
enabledNamespaces Role (generated for the lasuite namespace) into ory,
producing a duplicate-name conflict during kustomize build.
2026-03-03 00:57:58 +00:00
c7b812dde8 feat(ory): replace hardcoded DSN + secrets with OpenBao DB engine + VSO
All Ory service credentials now flow from OpenBao through VSO instead
of being hardcoded in Helm values or Deployment env vars.

Kratos:
- Remove config.dsn; flip secret.enabled=false with nameOverride pointing
  at kratos-app-secrets (a VSO-managed Secret with secretsDefault,
  secretsCookie, smtpConnectionURI).
- Inject DSN at runtime via deployment.extraEnv from kratos-db-creds
  (VaultDynamicSecret backed by OpenBao database static role, 24h rotation).

Hydra:
- Remove config.dsn; inject DSN via deployment.extraEnv from hydra-db-creds
  (VaultDynamicSecret, same rotation scheme).

Login UI:
- Replace hardcoded COOKIE_SECRET/CSRF_COOKIE_SECRET env var values with
  secretKeyRef reads from login-ui-secrets (VaultStaticSecret → secret/login-ui).

vault-secrets.yaml adds: VaultAuth, Hydra VSS, kratos-app-secrets VSS,
login-ui-secrets VSS, kratos-db-creds VDS, hydra-db-creds VDS.
2026-03-02 18:32:33 +00:00
5e36322a3b lasuite: declarative pre-work for La Suite app deployments
- Add find user and find_db to postgres-cluster.yaml (11th database)
- Add sunbeam-messages-imports and sunbeam-people buckets to SeaweedFS
- Configure Hydra Maester with enabledNamespaces: [lasuite] so it can
  create and update OAuth2Client secrets in the lasuite namespace
- Add find to Kratos allowed_return_urls
- Add shared ConfigMaps: lasuite-postgres, lasuite-valkey, lasuite-s3,
  lasuite-oidc-provider — single source of truth for all app env vars
- Add HydraOAuth2Client CRDs for all nine La Suite apps (docs, drive,
  meet, conversations, messages, people, find, gitea, hive); Maester
  will create oidc-<app> secrets with CLIENT_ID and CLIENT_SECRET
2026-03-01 18:03:13 +00:00
cdddc334ff feat: replace nginx placeholder with custom Pingora proxy; add Postfix MTA
Ingress:
- Deploy custom sunbeam-proxy (Pingora/Rust) replacing nginx placeholder
- HTTPS termination with mkcert (local) / rustls-acme (production)
- Host-prefix routing with path-based sub-routing for auth virtual host:
  /oauth2 + /.well-known + /userinfo → Hydra, /kratos → Kratos (prefix stripped), default → login-ui
- HTTP→HTTPS redirect, WebSocket passthrough, JSON audit logging, OTEL stub
- cert-manager HTTP-01 ACME challenge routing via Ingress watcher
- RBAC for Ingress watcher (pingora-watcher ClusterRole)
- local overlay: hostPorts 80/443, LiveKit TURN demoted to ClusterIP to avoid klipper conflict

Infrastructure:
- socket_vmnet shared network for host↔VM reachability (192.168.105.2)
- local-up.sh: cert-manager installation, eth1-based LIMA_IP detection, correct DOMAIN_SUFFIX sed substitution
- Postfix MTA in lasuite namespace: outbound relay via Scaleway TEM, accepts SMTP from cluster pods
- Kratos SMTP courier pointed at postfix.lasuite.svc.cluster.local:25
- Production overlay: cert-manager ClusterIssuer, ACME-enabled Pingora values
2026-03-01 16:25:11 +00:00
a589e6280d feat: bring up local dev stack — all services running
- Ory Hydra + Kratos: fixed secret management, DSN config, DB migrations,
  OAuth2Client CRD (helm template skips crds/ dir), login-ui env vars
- SeaweedFS: added s3.json credentials file via -s3.config CLI flag
- OpenBao: standalone mode with auto-unseal sidecar, keys in K8s secret
- OpenSearch: increased memory to 1.5Gi / JVM 1g heap
- Gitea: SSL_MODE disable, S3 bucket creation fixed
- Hive: automountServiceAccountToken: false (Lima virtiofs read-only rootfs quirk)
- LiveKit: API keys in values, hostPort conflict resolved
- Linkerd: native sidecar (proxy.nativeSidecar=true) to avoid blocking Jobs
- All placeholder images replaced: pingora→nginx:alpine, login-ui→oryd/kratos-selfservice-ui-node

Full stack running: postgres, valkey, openbao, opensearch, seaweedfs,
kratos, hydra, gitea, livekit, hive (placeholder), login-ui
2026-02-28 22:08:38 +00:00
92e80a761c fix(ory): re-enable hydra-maester, fix namespace, add memory limit 2026-02-28 14:02:47 +00:00
886c4221b2 fix(local): kustomize render passes cleanly
- Remove base/mesh from local overlay (Linkerd installed via CLI in local-up.sh)
- Fix LiveKit namespace: chart doesn't set .Release.Namespace, add explicit patches
- Fix release names: livekit-server and cloudnative-pg match chart names (avoid double-prefix)
- Disable hydra-maester (not needed for local dev)
- Add memory limits for cloudnative-pg operator and livekit-server deployments
- Remove non-functional values-ory.yaml patch (DOMAIN_SUFFIX handled by sed in local-up.sh)
- Gitignore **/charts/ (kustomize helm cache, generated artifact)
2026-02-28 14:00:31 +00:00
5d9bd7b067 chore: initial infrastructure scaffold
Kustomize base + overlays for the full Sunbeam k3s stack:
- base/mesh      — Linkerd edge (crds + control-plane + viz)
- base/ingress   — custom Pingora edge proxy
- base/ory       — Kratos 0.60.1 + Hydra 0.60.1 + login-ui
- base/data      — CloudNativePG 0.27.1, Valkey 8, OpenSearch 2
- base/storage   — SeaweedFS master + volume + filer (S3 on :8333)
- base/lasuite   — Hive sync daemon + La Suite app placeholders
- base/media     — LiveKit livekit-server 1.9.0
- base/devtools  — Gitea 12.5.0 (external PG + Valkey)
overlays/local   — sslip.io domain, mkcert TLS, Lima hostPort
overlays/production — stub (TODOs for sunbeam.pt values)
scripts/         — local-up/down/certs/urls helpers
justfile         — up / down / certs / urls targets
2026-02-28 13:42:27 +00:00