Intro to Observability as Code: Managing Dashboards with GitOps

Observability as code treats dashboards, alerts, data sources, and SLOs like application code: versioned, reviewed, and deployed through automation. For teams running Grafana, Prometheus, and OpenTelemetry, that typically means storing dashboard definitions in Git and using a GitOps controller (Argo CD, Flux) to reconcile those definitions into running Grafana instances. This approach reduces drift, improves reviewability, and makes dashboard changes auditable. (grafana.com)

Why manage dashboards with GitOps?

A simple GitOps workflow for dashboards

  1. Author: Dashboards are authored or exported as JSON (or generated from templates). Grafana stores each dashboard as a JSON document, which makes it portable and machine-editable. (codelit.io)
  2. Commit: Dashboard files (or template sources) live in a Git repository structured by team, environment, or service.
  3. Validate: CI runs linters and schema checks against dashboard JSON or generated output.
  4. Reconcile: A GitOps controller (Argo CD / Flux) applies a Kubernetes resource (e.g., Grafana Operator custom resources or a sync job) that injects the dashboards into Grafana. (grafana.com)
  5. Observe: Dashboards appear in Grafana; future edits follow the same PR/cycle.

Tooling and patterns that make this practical

Validation and testing: keep changes safe

Practical repo layout (example)

Why templating helps Templates let you:

Common pitfalls and how teams avoid them

When GitOps isn’t just deployment automation Treating dashboards as code is also a platform-engineering lever. Platform teams can:

Summary Managing dashboards with GitOps transforms monitoring from manual configuration into a trackable, testable, and automatable part of your delivery pipeline. For Grafana users, the ecosystem already supports several practical flows: raw JSON provisioning, templating with Jsonnet, Kubernetes operators, and Git-backed features in Grafana itself. Those components combine into a workflow that reduces configuration drift, improves reviewability, and scales platform-wide observability. (grafana.com)

References (selected)