feat(monitoring): wire up full LGTM observability stack

- Prometheus: discover ServiceMonitors/PodMonitors in all namespaces,
  enable remote write receiver for Tempo metrics generator
- Tempo: enable metrics generator (service-graphs + span-metrics)
  with remote write to Prometheus
- Loki: add Grafana Alloy DaemonSet to ship container logs
- Grafana: enable dashboard sidecar, add Pingora/Loki/Tempo/OpenBao
  dashboards, add stable UIDs and cross-linking between datasources
  (Loki↔Tempo derived fields, traces→logs, traces→metrics, service map)
- Linkerd: enable proxy tracing to Alloy OTLP collector, point
  linkerd-viz at existing Prometheus instead of deploying its own
- Pingora: add OTLP rollout plan (endpoint commented out until proxy
  telemetry panic fix is deployed and Alloy is verified healthy)
This commit is contained in:
2026-03-21 17:36:54 +00:00
parent 5f923d14f9
commit d3943c9a84
9 changed files with 523 additions and 0 deletions

View File

@@ -16,6 +16,18 @@ tempo:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
# Generate span-derived RED metrics (rate / errors / duration) and push
# them into Prometheus so Grafana can show service-level indicators
# even without application-level metrics exporters.
metricsGenerator:
enabled: true
remoteWriteUrl: "http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write"
overrides:
defaults:
metrics_generator:
processors:
- service-graphs
- span-metrics
persistence:
enabled: true