feat: add PrometheusRule alerts for all services
28 alert rules across 9 PrometheusRule files covering infrastructure
(Longhorn, cert-manager), data (PostgreSQL, OpenBao, OpenSearch),
storage (SeaweedFS), devtools (Gitea), identity (Hydra, Kratos),
media (LiveKit), and mesh (Linkerd golden signals for all services).
Severity routing: critical alerts fire to Matrix + email, warnings
to Matrix only (AlertManager config updated in separate commit).
2026-03-24 12:20:55 +00:00
|
|
|
apiVersion: monitoring.coreos.com/v1
|
|
|
|
|
kind: PrometheusRule
|
|
|
|
|
metadata:
|
|
|
|
|
name: opensearch-alerts
|
|
|
|
|
namespace: data
|
|
|
|
|
labels:
|
|
|
|
|
role: alert-rules
|
|
|
|
|
spec:
|
|
|
|
|
groups:
|
|
|
|
|
- name: opensearch
|
|
|
|
|
rules:
|
|
|
|
|
- alert: OpenSearchClusterRed
|
2026-03-25 17:53:59 +00:00
|
|
|
expr: elasticsearch_cluster_health_status{color="red"} == 1
|
feat: add PrometheusRule alerts for all services
28 alert rules across 9 PrometheusRule files covering infrastructure
(Longhorn, cert-manager), data (PostgreSQL, OpenBao, OpenSearch),
storage (SeaweedFS), devtools (Gitea), identity (Hydra, Kratos),
media (LiveKit), and mesh (Linkerd golden signals for all services).
Severity routing: critical alerts fire to Matrix + email, warnings
to Matrix only (AlertManager config updated in separate commit).
2026-03-24 12:20:55 +00:00
|
|
|
for: 2m
|
|
|
|
|
labels:
|
|
|
|
|
severity: critical
|
|
|
|
|
annotations:
|
|
|
|
|
summary: "OpenSearch cluster health is RED"
|
|
|
|
|
description: "OpenSearch cluster {{ $labels.cluster }} health status is red."
|
|
|
|
|
|
|
|
|
|
- alert: OpenSearchClusterYellow
|
2026-03-25 17:53:59 +00:00
|
|
|
expr: elasticsearch_cluster_health_status{color="yellow"} == 1
|
feat: add PrometheusRule alerts for all services
28 alert rules across 9 PrometheusRule files covering infrastructure
(Longhorn, cert-manager), data (PostgreSQL, OpenBao, OpenSearch),
storage (SeaweedFS), devtools (Gitea), identity (Hydra, Kratos),
media (LiveKit), and mesh (Linkerd golden signals for all services).
Severity routing: critical alerts fire to Matrix + email, warnings
to Matrix only (AlertManager config updated in separate commit).
2026-03-24 12:20:55 +00:00
|
|
|
for: 10m
|
|
|
|
|
labels:
|
|
|
|
|
severity: warning
|
|
|
|
|
annotations:
|
|
|
|
|
summary: "OpenSearch cluster health is YELLOW"
|
|
|
|
|
description: "OpenSearch cluster {{ $labels.cluster }} health status is yellow."
|
|
|
|
|
|
|
|
|
|
- alert: OpenSearchHeapHigh
|
2026-03-25 17:53:59 +00:00
|
|
|
expr: (elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) > 0.85
|
feat: add PrometheusRule alerts for all services
28 alert rules across 9 PrometheusRule files covering infrastructure
(Longhorn, cert-manager), data (PostgreSQL, OpenBao, OpenSearch),
storage (SeaweedFS), devtools (Gitea), identity (Hydra, Kratos),
media (LiveKit), and mesh (Linkerd golden signals for all services).
Severity routing: critical alerts fire to Matrix + email, warnings
to Matrix only (AlertManager config updated in separate commit).
2026-03-24 12:20:55 +00:00
|
|
|
for: 5m
|
|
|
|
|
labels:
|
|
|
|
|
severity: warning
|
|
|
|
|
annotations:
|
|
|
|
|
summary: "OpenSearch JVM heap usage is high"
|
2026-03-25 17:53:59 +00:00
|
|
|
description: "OpenSearch node {{ $labels.name }} in {{ $labels.namespace }} heap usage is above 85%."
|