on
Intro to Observability as Code: Managing Dashboards with GitOps
Observability as code brings the same benefits teams already enjoy for application code—versioning, review, traceability, and reproducible deployments—into monitoring and dashboards. For teams running Grafana and other visualization tools, treating dashboards as code and using GitOps to deploy them reduces manual drift, improves collaboration, and makes configuration changes auditable. This article gives an approachable introduction to the concept, common tooling, a practical GitOps pattern for dashboards, and the trade-offs to keep in mind.
Why treat dashboards as code?
- Reproducibility: Dashboards defined in text files (JSON, Jsonnet, YAML, etc.) can be reproduced across environments (dev/stage/prod) without manual clicks.
- Collaboration: Git PRs let engineers review visualization changes, annotate intent, and link discussions to issues.
- Auditability and rollbacks: Every change is a commit—revert with confidence if a dashboard update breaks interpretation or performance.
- Automation: CI can validate dashboard schemas, run linters, or push changes via a reconciler so human error is reduced.
Grafana and the broader ecosystem now offer multiple ways to adopt this approach—from built-in Git synchronization to operators and dedicated CLI tools—so you can pick a workflow that fits your platform and team skills. (grafana.com)
Common patterns and tools
- Built-in Git sync (Grafana): Newer Grafana releases include “Git Sync” or “Git integration” features that let you link dashboards and folders to a repository, saving edits directly to Git without leaving the UI. This is great for teams that want a tight UI + Git loop. (grafana.com)
- Grafana Operator + GitOps controllers (Argo CD / Flux): In Kubernetes-native environments you can manage dashboards via Custom Resources (GrafanaDashboard CRDs) and let Argo CD or Flux reconcile those resources into running Grafana instances. That enables a declarative GitOps loop across clusters and environments. (grafana.com)
- Jsonnet + Grafonnet: Jsonnet is a templating language for JSON; the grafonnet library helps programmatically generate Grafana dashboard JSON. This is helpful where you want reusable panel templates or parameterized dashboards.
- Grizzly: A CLI and server tool designed to validate, preview, and publish Grafana resources from code. It can be used from CI to catch issues before pushing to production. (grafana.github.io)
- Terraform / SDKs: For teams that prefer Terraform or language-specific SDKs, there are providers and SDKs that manage dashboards, datasources, and alerting rules as code.
A simple GitOps workflow for dashboards (Kubernetes + Grafana Operator) Below is a compact, practical pattern used by many teams running Grafana in Kubernetes. It balances declarative configuration, review via Git, and automated reconciliation.
1) Author dashboards as files
- Store dashboard JSON (or Jsonnet that compiles to JSON) under a repo path like infra/observability/dashboards/.
- Use a clear folder structure per product or service, and include metadata (README, dashboard purpose).
2) Continuous validation (CI)
- Lint or validate JSON schema (or compile Jsonnet and validate).
- Run Grizzly or a lightweight JSON schema check to ensure panels reference known datasources and required fields exist.
3) Reconcile via GitOps controller
- Push changes to the main branch and let Argo CD / Flux sync a Kubernetes manifest that contains GrafanaDashboard CRs (the Grafana Operator watches these and applies dashboards to Grafana).
- The operator reconciles the CRDs into the Grafana instance, so the dashboard state in Grafana matches the repository. (grafana.com)
Example: minimal GrafanaDashboard CRD This YAML shows the shape of a GrafanaDashboard CRD a GitOps controller would apply. The operator converts it into a deployed dashboard inside Grafana.
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: my-service-overview
namespace: observability
spec:
json: |
{
"uid": "my-service-overview",
"title": "My Service — Overview",
"panels": [
{
"type": "graph",
"title": "HTTP 5xx rate",
"targets": [{ "expr": "sum(rate(http_requests_total{job=\"myservice\",status=~\"5..\"}[5m]))" }]
}
]
}
Example: small Jsonnet/grafonnet snippet If you prefer templates and reuse, Jsonnet can reduce repetition when generating many dashboards programmatically.
local grafana = import 'grafonnet/grafana.libsonnet';
grafana.dashboard.new("my-service-overview") +
{
uid: "my-service-overview",
panels: [
grafana.panel.timeseries.new(1, 1, {
title: "HTTP 5xx rate",
datasource: "Prometheus",
targets: [{ expr: 'sum(rate(http_requests_total{job="myservice",status=~"5.."}[5m]))' }]
})
]
}
Why this pattern works
- Declarative state: Using CRDs or file provisioning makes the desired dashboard state explicit in Git, not locked in the Grafana DB.
- Automated deployment: A GitOps controller ensures the runtime state converges with the repo without manual pushes.
- Safe collaboration: Pull requests become the standard place for design review and commenting on visualizations or queries.
Practical tips and best practices
- Use stable UIDs: Assign a fixed dashboard uid (not random) so the operator/provisioner can update dashboards in-place rather than creating duplicates. (grafana.github.io)
- Separate infra vs. product dashboards: Keep dashboards that belong to platform teams (cluster health, exporters) separate from service-owned dashboards to make ownership clear.
- Datasource mapping: Avoid hard-coding datasource names that differ across environments. Use datasource UIDs or templating to map to environment-specific datasources at deploy time. (grafana.com)
- Lint and validate in CI: Fail builds on invalid JSON, missing queries, or references to non-existent datasources. Tools like Grizzly can assist with validation and previewing in a test Grafana instance. (grafana.github.io)
- Keep panels small and focused: Reusable library panels (or panel templates) reduce duplication and make dashboards easier to maintain.
- Document intent: Treat dashboards like software: explain the question the dashboard answers in a README or a top-level text panel.
Trade-offs and common pitfalls
- UI vs code ergonomics: Building and tweaking complex visualizations in code can be slower than using the GUI. Many teams adopt a hybrid loop: design in the UI, export a baseline JSON, then iterate in code. (grafana.com)
- Large JSON diffs: Raw dashboard JSON can produce noisy diffs from reordering or generated IDs. Jsonnet or templating can reduce noise; strict formatting and tests help too.
- Drift management: If users edit dashboards directly in Grafana, you must decide whether Git is the source of truth or Grafana’s UI is. Enable policies (or block UI edits) if you want Git to always win.
- Permissions and secrets: Credentials for datasources or API tokens should never be committed. Use a secrets manager or Kubernetes Secret, and link datasources by secure reference.
When to lean on an operator vs. built-in Git sync
- Use the Grafana Operator + Argo CD/Flux when you run Grafana inside Kubernetes and need cluster-level, multi-instance, or multi-namespace GitOps control. This is a natural fit for platform teams managing many Grafana instances. (grafana.com)
- Use built-in Git sync features when you want a simpler workflow that keeps the editing experience in Grafana but still records changes to a repo—particularly useful for smaller teams or when you prefer the UI-first authoring model. (grafana.com)
Closing thoughts Managing dashboards as code with GitOps reduces brittle manual processes and makes visualization changes auditable and repeatable. The ecosystem provides several viable approaches—from in-Grafana Git integrations to operator-based GitOps and templating with Jsonnet—so pick the pattern that matches your platform, team habits, and tolerance for tooling complexity. Whichever path you choose, focus on small, testable dashboards, stable identifiers, CI validation, and clear ownership to get the most benefit from observability as code. (grafana.github.io)