The Longhorn memory leak went undetected for 14 days because alerting was broken (email receiver, missing label selector, no node alerts). This overhaul brings alerting to production grade. Fixes: - Alloy Loki URL pointed to deleted loki-gateway, now loki:3100 - seaweedfs-bucket-init crash on unsupported `mc versioning` command - All PrometheusRules now have `release: kube-prometheus-stack` label - Removed broken email receiver, Matrix-only alerting New alert coverage: - Node: memory, CPU, swap, filesystem, inodes, network, clock skew, OOM - Kubernetes: deployment down, CronJob failed, pod crash-looping, PVC full - Backups: Postgres barman stale/failed, WAL archiving, SeaweedFS mirror - Observability: Prometheus WAL/storage/rules, Loki/Tempo/AlertManager down - Services: Stalwart, Bulwark, Tuwunel, Sol, Valkey, OpenSearch (smart) - SLOs: auth stack 99.9% burn rate, Matrix 99.5%, latency p95 < 2s - Recording rules for Linkerd RED metrics and node aggregates - Watchdog heartbeat → Matrix every 12h (dead pipeline detection) - Inhibition: critical suppresses warning for same alert+namespace - OpenSearchClusterYellow only fires with >1 data node (single-node aware)
46 lines
1.6 KiB
YAML
46 lines
1.6 KiB
YAML
apiVersion: batch/v1
|
|
kind: Job
|
|
metadata:
|
|
name: seaweedfs-bucket-init
|
|
namespace: lasuite
|
|
annotations:
|
|
# Run once on first deploy; manually delete to re-run if needed.
|
|
helm.sh/hook: post-install
|
|
spec:
|
|
template:
|
|
spec:
|
|
restartPolicy: OnFailure
|
|
containers:
|
|
- name: mc
|
|
image: minio/mc:latest
|
|
command:
|
|
- /bin/sh
|
|
- -c
|
|
- |
|
|
set -e
|
|
ENDPOINT=http://seaweedfs-filer.storage.svc.cluster.local:8333
|
|
mc alias set weed "$ENDPOINT" "$S3_ACCESS_KEY" "$S3_SECRET_KEY"
|
|
|
|
for bucket in \
|
|
sunbeam-meet \
|
|
sunbeam-drive \
|
|
sunbeam-messages \
|
|
sunbeam-messages-imports \
|
|
sunbeam-conversations \
|
|
sunbeam-git-lfs \
|
|
sunbeam-game-assets \
|
|
sunbeam-ml-models \
|
|
sunbeam-stalwart \
|
|
sunbeam-sccache; do
|
|
mc mb --ignore-existing "weed/$bucket"
|
|
echo "Ensured bucket: $bucket"
|
|
done
|
|
|
|
# Enable object versioning on buckets that require it.
|
|
# Drive's WOPI GetFile response includes X-WOPI-ItemVersion from S3 VersionId.
|
|
# SeaweedFS doesn't support `mc versioning` — use the S3 API directly.
|
|
mc versioning enable weed/sunbeam-drive || echo "Versioning not supported by SeaweedFS mc, skipping (filer handles versioning natively)"
|
|
envFrom:
|
|
- secretRef:
|
|
name: seaweedfs-s3-credentials
|