on
Managing dashboards with GitOps: an intro to observability as code
Observability as Code (OaC) applies the same engineering practices we use for infrastructure and application code—source control, code review, and automated pipelines—to monitoring assets like dashboards, alerts, and data sources. The goal is reproducible, auditable observability that travels with the rest of your system code. Large engineering teams (and platform groups) often treat dashboards as first-class artifacts that must be reviewed, tested, and promoted between environments—exactly the use case GitOps was built for. (microsoft.github.io)
This article explains practical approaches to managing dashboards with GitOps, shows common patterns and small examples, and lists pragmatic best practices you can adapt to your stack.
Why manage dashboards with GitOps?
- Reproducibility: a declarative dashboard file + provisioning pipeline means you can recreate an instance from Git alone. (microsoft.github.io)
- Review and audit: pull requests give you history, discussion, and an ability to gate changes. (splunk.com)
- Automatic reconciliation: GitOps controllers (Argo CD, Flux) can apply the desired state and self-heal drift. (grafana.com)
- Separation of concerns: platform teams own bootstrap provisioning (immutable files), while service teams can work via reviewed changes (Terraform/CRs). (kloudvin.com)
Three practical GitOps patterns for dashboards 1) File provisioning (container/instance bootstrap)
- What it is: dashboard JSON files and provisioning YAML placed on disk or bundled with a Grafana image. Grafana reads and imports them on startup (and on a periodic interval). This is great for immutable, bootstrap configuration (data sources, core dashboards). (kloudvin.com)
- Pros: simple, idempotent, dashboards are read-only in the UI (prevents accidental edits). (kloudvin.com)
- Cons: limited lifecycle features (no drift detection), harder to parameterize across environments.
2) Kubernetes + Operator CRDs (declarative resources)
- What it is: run the Grafana Operator in Kubernetes and create GrafanaDashboard / GrafanaFolder / Grafana CRs that the operator reconciles into a Grafana instance. This fits naturally into a GitOps repo of Kubernetes manifests and works well for multi-tenant or multi-instance setups. (grafana.github.io)
- Small example (minimal GrafanaDashboard CR, adapted from the operator docs):
apiVersion: grafana.integreatly.org/v1beta1 kind: GrafanaDashboard metadata: name: service-overview namespace: observability spec: instanceSelector: matchLabels: dashboards: "grafana" json: | { "title": "Service Overview", "uid": "service-overview", "panels": [] }The operator supports several dashboard sources (raw JSON, gzipped JSON, URL, ConfigMap, OCI) and provides folder and uid management options. (grafana.github.io)
3) Sidecar + ConfigMaps with Flux (or similar)
- What it is: deploy Grafana with a sidecar (common in Helm charts) that watches ConfigMaps for labeled dashboard JSON. Reconcile those ConfigMaps from Git with Flux or another GitOps controller. This is a lightweight path for teams using Flux or HelmRelease flows. (oneuptime.com)
- Small Flux-aligned pattern:
- Put exported dashboard JSON into a ConfigMap file in Git with label grafana_dashboard: “1”.
- Use a Flux Kustomization to apply that directory (prune=true so deleted files remove dashboards).
- Grafana sidecar loads the dashboards automatically.
Example of the essential ConfigMap form:
apiVersion: v1 kind: ConfigMap metadata: name: kubernetes-overview-dashboard namespace: monitoring labels: grafana_dashboard: "1" data: kubernetes-overview.json: | { "title": "Kubernetes Overview", "uid": "k8s-overview", "panels": [] }Flux Kustomizations can then reconcile these configs and ensure Grafana reflects Git. (oneuptime.com)
When to use each pattern
- Use file provisioning for bootstrap resources that must exist before Grafana can do anything (core data sources, initial folders, immutable dashboards). It’s simple and reliable. (kloudvin.com)
- Use operator CRDs when you run Kubernetes and want native Kubernetes ownership, multi-instance targets, or richer features (folders CR, datasource CRs). (grafana.github.io)
- Use sidecar+ConfigMap when you prefer minimal operator complexity and already run Flux or helm-based GitOps. It’s a pragmatic middle ground. (oneuptime.com)
Tooling and ways to generate dashboard JSON
- Manual export/import: OK for one-offs, but JSON is verbose and brittle. Always remove instance-local fields (id, version) before committing. (kloudvin.com)
- Template/render (Jsonnet / Grafonnet): write composable dashboard templates and render to JSON in CI. Grafonnet is a maintained Jsonnet library for generating Grafana dashboards. This reduces repeated JSON edits and makes panels reusable. (grafana.github.io)
- Terraform provider: use Terraform for resources that have lifecycle, state, and drift detection (folders, permissions, alerting policies). Terraform gives plan/apply semantics and is good for promotion workflows. (kloudvin.com)
Practical CI checks and pipeline ideas
- Lint and validate dashboard JSON schema in your PR pipeline (simple jsonlint or schema validator). Many teams run a quick jq-based check or a JSON schema validation step. (oneuptime.com)
- Render templates and diff output in CI: if you use Jsonnet/Grafonnet, render dashboards as part of CI to show the actual JSON diff in PRs. (grafana.github.io)
- Gate changes for Terraform-managed resources: require
terraform planand an approved merge to apply critical resources like alerting policies and folders. Terraform’s plan is the canonical review artifact for lifecycle-managed objects. (kloudvin.com)
Common pitfalls and how to avoid them
- UID and data source mismatch: dashboards reference datasources by UID. Pin UIDs for datasources and folders, and keep them consistent across environments to avoid “datasource not found” problems. (kloudvin.com)
- Editing in the UI vs. code: don’t treat UI edits as the source of truth; either export and commit changes or make the dashboard read-only via provisioning so the Git repo stays canonical. (kloudvin.com)
- Secret handling: API tokens and service-account credentials must be kept out of Git and injected via secret management (Kubernetes Secrets, Vault, CI secret store). Grafana Operator and Terraform provider both integrate with secrets—configure them securely. (grafana.com)
- Large dashboards in CRDs: operator CRDs can hit etcd size limits; the Grafana Operator supports gzipped JSON and other sources (URL, ConfigMap, OCI) to handle big dashboards. (grafana.github.io)
A short, pragmatic checklist
- Start small: choose one dashboard or alerting rule and manage it from Git. Convert it to the chosen pattern (file/CRD/ConfigMap) and run it through your normal PR workflow. (microsoft.github.io)
- Pin UIDs for data sources and folders before you promote between environments. (kloudvin.com)
- Add CI validation: JSON schema checks, template rendering, and for Terraform-managed items, require a successful
plan. (oneuptime.com) - Document ownership: which team owns bootstrap provisioning, who reviews dashboard changes, and how to respond to incidents caused by dashboard errors. (microsoft.github.io)
Closing notes (practical perspective) Managing dashboards with GitOps is less about tooling and more about consistent habits: treat observability assets as code, protect secrets, and make changes visible through the normal engineering review loop. The ecosystem gives you options—file provisioning for bootstrap, operator CRDs for Kubernetes-native control, sidecar+ConfigMap for flux-style workflows, and Terraform for lifecycle-managed resources—so pick the pattern that matches your operational needs and scale. The official Grafana operator and cloud docs, Flux/Argo CD guides, and Terraform provider writeups are good references as you adopt a specific flow. (grafana.github.io)
References
- Observability as Code guidance and benefits: Microsoft Engineering Playbook and Splunk overview. (microsoft.github.io)
- Grafana Operator examples and GrafanaDashboard CR details. (grafana.github.io)
- Grafana Cloud guide: managing dashboards with ArgoCD using the Grafana Operator. (grafana.com)
- Flux/CD pattern for dashboards with Grafana sidecar and ConfigMaps. (oneuptime.com)
- Terraform-centric Grafana-as-code patterns (folders, datasource UIDs, Terraform provider examples). (kloudvin.com)