feat(infra): production bootstrap — cert-manager, longhorn, monitoring
Add new bases for cert-manager (Let's Encrypt + wildcard cert), Longhorn distributed storage, and monitoring (kube-prometheus-stack + Loki + Tempo + Grafana OIDC). Add cloud-init for Scaleway Elastic Metal provisioning. Production overlay: add patches for postgres sizing, SeaweedFS volume, OpenSearch storage, LiveKit service, Pingora host ports, resource limits, and CNPG daily barman backups. Update cert-manager.yaml with full dnsNames for all *.sunbeam.pt subdomains.
This commit is contained in:
@@ -1,18 +1,30 @@
|
||||
# cert-manager resources for production TLS.
|
||||
# cert-manager issuers and certificate for production TLS.
|
||||
#
|
||||
# Prerequisites:
|
||||
# cert-manager must be installed in the cluster before applying this overlay:
|
||||
# kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
|
||||
# WORKFLOW: start with letsencrypt-staging to verify the HTTP-01 challenge
|
||||
# flow works without burning production rate limits. Once the staging cert
|
||||
# is issued successfully, flip the Certificate issuerRef to letsencrypt-production
|
||||
# and delete the old Secret so cert-manager re-issues with a trusted cert.
|
||||
#
|
||||
# DOMAIN_SUFFIX and ACME_EMAIL are substituted by sed at deploy time.
|
||||
# See overlays/production/kustomization.yaml for the deploy command.
|
||||
# ACME_EMAIL is substituted by sunbeam apply.
|
||||
---
|
||||
# ClusterIssuer: Let's Encrypt production via HTTP-01 challenge.
|
||||
#
|
||||
# cert-manager creates one Ingress per challenged domain. The pingora proxy
|
||||
# watches these Ingresses and routes /.well-known/acme-challenge/<token>
|
||||
# requests to the per-domain solver Service, so multi-SAN certificates are
|
||||
# issued correctly even when all domain challenges run in parallel.
|
||||
# Let's Encrypt staging — untrusted cert but no rate limits. Use for initial setup.
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: letsencrypt-staging
|
||||
spec:
|
||||
acme:
|
||||
server: https://acme-staging-v02.api.letsencrypt.org/directory
|
||||
email: ACME_EMAIL
|
||||
privateKeySecretRef:
|
||||
name: letsencrypt-staging-account-key
|
||||
solvers:
|
||||
- http01:
|
||||
ingress:
|
||||
serviceType: ClusterIP
|
||||
---
|
||||
# Let's Encrypt production — trusted cert, strict rate limits.
|
||||
# Switch to this once staging confirms challenges resolve correctly.
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
@@ -26,16 +38,11 @@ spec:
|
||||
solvers:
|
||||
- http01:
|
||||
ingress:
|
||||
# ingressClassName is intentionally blank: cert-manager still creates
|
||||
# the Ingress object (which the proxy watches), but no ingress
|
||||
# controller needs to act on it — the proxy handles routing itself.
|
||||
ingressClassName: ""
|
||||
serviceType: ClusterIP
|
||||
---
|
||||
# Certificate: single multi-SAN cert covering all proxy subdomains.
|
||||
# cert-manager issues it via HTTP-01, stores it in pingora-tls Secret, and
|
||||
# renews it automatically ~30 days before expiry. The watcher in sunbeam-proxy
|
||||
# detects the Secret update and triggers a graceful upgrade so the new cert is
|
||||
# loaded without dropping any connections.
|
||||
# Certificate covering all proxy subdomains.
|
||||
# Start with letsencrypt-staging. Once verified, change issuerRef.name to
|
||||
# letsencrypt-production and delete the pingora-tls Secret to force re-issue.
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
@@ -56,3 +63,6 @@ spec:
|
||||
- src.DOMAIN_SUFFIX
|
||||
- auth.DOMAIN_SUFFIX
|
||||
- s3.DOMAIN_SUFFIX
|
||||
- grafana.DOMAIN_SUFFIX
|
||||
- admin.DOMAIN_SUFFIX
|
||||
- integration.DOMAIN_SUFFIX
|
||||
|
||||
Reference in New Issue
Block a user