on
Intro to Observability as Code: Managing Dashboards with GitOps
Observability as code treats dashboards, alerts, data sources, and SLOs like application code: versioned, reviewed, and deployed through automation. For teams running Grafana, Prometheus, and OpenTelemetry, that typically means storing dashboard definitions in Git and using a GitOps controller (Argo CD, Flux) to reconcile those definitions into running Grafana instances. This approach reduces drift, improves reviewability, and makes dashboard changes auditable. (grafana.com)
Why manage dashboards with GitOps?
- Repeatability: A repo is a single source of truth for dashboard structure across environments. (grafana.com)
- Review and audit: Pull requests let engineers discuss queries, panel design, and thresholds before changes reach on-call screens. (grafana.com)
- Automated deployment: A GitOps operator can push validated dashboard files into many Grafana instances without manual clicks. (grafana.com)
A simple GitOps workflow for dashboards
- Author: Dashboards are authored or exported as JSON (or generated from templates). Grafana stores each dashboard as a JSON document, which makes it portable and machine-editable. (codelit.io)
- Commit: Dashboard files (or template sources) live in a Git repository structured by team, environment, or service.
- Validate: CI runs linters and schema checks against dashboard JSON or generated output.
- Reconcile: A GitOps controller (Argo CD / Flux) applies a Kubernetes resource (e.g., Grafana Operator custom resources or a sync job) that injects the dashboards into Grafana. (grafana.com)
- Observe: Dashboards appear in Grafana; future edits follow the same PR/cycle.
Tooling and patterns that make this practical
- Plain JSON + provisioning: Grafana supports provisioning dashboards from files/config maps or the provision API, which lets operators apply dashboards automatically. Teams often keep exported JSON files in Git for simple setups. (grafana.com)
- Jsonnet / grafonnet: When JSON becomes unwieldy, templating languages like Jsonnet (and libraries such as grafonnet) let you generate consistent dashboards with variables and shared panel definitions. This is especially useful for “golden” or service-class dashboards. (datko.pl)
- Grafana Operator + Argo CD: The Grafana Operator provides Kubernetes CRDs to manage dashboards, data sources, and more. Argo CD (or Flux) can watch the Git repo and ensure the operator creates/updates dashboards inside clusters automatically. Grafana’s docs show examples of Argo CD managing dashboard lifecycles. (grafana.com)
- Git Sync and cloud features: Grafana has been evolving “observability as code” features including Git Sync and tighter Git integration in Grafana Cloud and OSS, enabling workflows where dashboards are linked directly to Git repositories. These features lower friction for teams that want branching and PRs inside Grafana workflows. (grafana.com)
Validation and testing: keep changes safe
- Lint generated JSON: Use schema validators to catch malformed JSON or deprecated panel fields before they touch production screens.
- Visual diffs: Because a JSON diff can be noisy, generate a lightweight metadata summary (title, panels count, queries) and surface that in PRs so reviewers can reason about impact.
- Staging environment: Deploy dashboards to a staging Grafana first and use human review plus synthetic checks (panel rendering success) before promoting.
Practical repo layout (example)
- /dashboards/
- /team-a/
- service-x-dashboard.jsonnet
- service-y-dashboard.json
- /golden/
- golden-service-template.jsonnet
- /team-a/
- /ci/
- validate-dashboard-schema.yml
Why templating helps Templates let you:
- Reuse panel definitions across services (same latency panel, different queries).
- Inject environment-level variables (prod vs. staging data source endpoints).
- Generate hundreds of dashboards from a small set of templates when operating many services. Jsonnet-based approaches and community tooling are widely used for this reason. (datko.pl)
Common pitfalls and how teams avoid them
- Manual edits in the UI: If people edit dashboards directly, the repo will drift. Enforce “no direct edits” or enable Git Sync so UI changes produce PRs. Grafana has features to connect dashboards to Git to reduce this friction. (grafana.com)
- Large JSON churn: Avoid hand-editing exported JSON; prefer templates or small editable metadata files to keep diffs meaningful. (codelit.io)
- Too many dashboards: Apply governance — keep a golden signals template per service and encourage teams to extend instead of proliferating near-duplicate dashboards.
When GitOps isn’t just deployment automation Treating dashboards as code is also a platform-engineering lever. Platform teams can:
- Provide curated templates so service teams get good “default” views.
- Automatically generate and onboard dashboards when a new service is created.
- Enforce company-wide SLO/alerting patterns by templating panels and alert thresholds.
Summary Managing dashboards with GitOps transforms monitoring from manual configuration into a trackable, testable, and automatable part of your delivery pipeline. For Grafana users, the ecosystem already supports several practical flows: raw JSON provisioning, templating with Jsonnet, Kubernetes operators, and Git-backed features in Grafana itself. Those components combine into a workflow that reduces configuration drift, improves reviewability, and scales platform-wide observability. (grafana.com)
References (selected)
- Grafana documentation: Observability as Code. (grafana.com)
- Grafana Cloud: Manage dashboards with GitOps using Argo CD. (grafana.com)
- Grafana Labs blog: Observability as code — automate observability workflows. (grafana.com)
- Jsonnet / dashboards-as-code examples and community slides. (datko.pl)