on
Intro to observability as code: managing dashboards with GitOps
Observability as code treats monitoring, dashboards, and alerting the same way you treat application code: stored in Git, reviewed, tested, and deployed via automated pipelines. For teams adopting GitOps, dashboards are a natural next step to bring under source control. This article explains the why and how of managing dashboards with GitOps, compares common approaches, and gives practical examples you can adapt to your environment.
Why treat dashboards as code?
- Reproducibility: a repository captures the exact JSON or resource definition that creates a dashboard, so you can recreate or clone dashboards across environments.
- Traceability: Git history provides audit trails for who changed what and when.
- Review and collaboration: pull requests let engineers discuss layout, queries, and thresholds before changes reach production.
- Automation and safety: CI checks (syntax, query validation, data source reachability) reduce busted dashboards appearing in the UI.
Grafana and many tools now provide first-class support for “observability as code”—new APIs, SDKs, and UI-driven Git sync features make it easier to integrate dashboards into GitOps workflows.(grafana.com)
Two common GitOps patterns for dashboards
- Operator / Kubernetes CRD approach
- Use a Kubernetes operator (e.g., Grafana Operator) that watches custom resources (CRs) in a Git-backed GitOps tool (Argo CD, Flux) and reconciles them into Grafana. Dashboards are stored as CR YAML in your repo; Argo CD applies them and the operator ensures Grafana reflects the desired state. This fits teams already using GitOps for cluster config and prefer Kubernetes-native management.(github.com)
- API / CLI / SDK + CI approach
- Generate dashboard JSON or code with a language SDK (Foundation SDK), validate in CI, and push to Grafana via a CLI (grafanactl), Terraform provider, or direct API calls from CI/CD pipelines. This pattern works well when you don’t want cluster-level operators or when dashboards live alongside application repos and are deployed via existing CI pipelines. Grafana’s Foundation SDK, new CLI tooling, and Terraform resources are aimed at making this workflow smoother.(grafana.com)
What’s changed recently (short overview)
- Grafana has been shipping an “observability as code” story with new resource-oriented APIs, a revised dashboard schema, a Foundation SDK, and a command-line tool to help with automation and multi-instance sync. There’s also a Git Sync feature (experimental/private preview) to connect a Grafana instance directly to a GitHub repo and create pull requests from the UI. These moves make both operator-based and API/CI approaches more robust.(grafana.com)
Example: dashboards-as-CRD (operator + Argo CD) This is the simplest operator-managed dashboard flow: your repo contains a Kubernetes manifest for a GrafanaDashboard CR that the Grafana Operator reconciles.
Example GrafanaDashboard CR (shortened):
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: example-dashboard
namespace: monitoring
spec:
resyncPeriod: 30s
instanceSelector:
matchLabels:
dashboards: "grafana"
json: >
{
"id": null,
"title": "Example Dashboard",
"panels": [
{
"type": "graph",
"title": "CPU Usage",
"targets": [
{ "expr": "sum(rate(container_cpu_usage_seconds_total[5m]))", "format": "time_series" }
]
}
]
}
- Commit this file to a Git repo that Argo CD monitors. When Argo CD syncs, the operator sees the new CR and creates/updates the dashboard in Grafana. The operator supports multiple ways to provide dashboard content (JSON, gzipped JSON, URL, Jsonnet) and handles UID management to avoid conflicts across instances.(grafana.github.io)
Pros and cons of the operator approach
- Pros:
- Native GitOps: dashboards are just another Kubernetes resource.
- Centralized reconciliation and multi-instance support (one operator can target multiple Grafana instances).
- Works well in clusters already managed by Argo CD/Flux.
- Cons:
- Requires a Kubernetes cluster and operator lifecycle management.
- CRs can be verbose (embedded JSON), which can make diffs noisy unless you adopt a generation step (Jsonnet, templating).
- Operator reconcilers can sometimes add complexity during upgrades—keep an eye on operator versioning and CR schema compatibility.(github.com)
Example: CI-driven API/CLI approach When you prefer pipelines to handle the push, generate dashboards using an SDK or template and deploy them with a CI job.
Sample GitHub Actions job (conceptual):
name: Deploy Grafana dashboards
on: [push]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate JSON
run: jq empty dashboards/*.json
- name: Push dashboards to Grafana
env:
GRAFANA_API_KEY: $
run: |
grafanactl push --instance https://grafana.example.com dashboards/
- The Foundation SDK can produce dashboard JSONs in a structured way (strongly-typed constructs), and grafanactl helps synchronize local resource files with Grafana instances—both reduce brittle hand-editing and enable local previews before deployment. Grafana’s docs include a full CI example using GitHub Actions to create or update dashboards automatically.(grafana.com)
Pros and cons of the CI/API approach
- Pros:
- No Kubernetes operator required—good for self-hosted or managed Grafana where you control CI.
- Easier to keep dashboards close to application code or SDK-generated artifacts.
- Often simpler to implement for small teams.
- Cons:
- You need to manage credentials and API rate limits carefully.
- It’s another pipeline to maintain; multi-instance promotion (dev → staging → prod) can become custom unless you standardize the workflow.
Choosing between operator and CI approaches
- Use operator + Argo CD when:
- You already use Kubernetes + GitOps for infra and want dashboards to follow the same lifecycle.
- You need multi-instance reconciliation or want to manage Grafana instances from the cluster.
- Use SDK/CLI + CI when:
- You don’t run Kubernetes or prefer to deploy dashboards from application repos.
- You want stronger control over generation and per-repo pipelines.
Best practices and patterns
- Keep dashboard code small and modular:
- Use templates or the Foundation SDK to compose panels and queries rather than copying large JSON blocks.
- Validate in CI:
- Syntax checks (jq), schema linting, and connection tests to datasources help catch problems before deployment. Grafana docs show examples of validating and automating dashboard provisioning with CI/CD.(grafana.com)
- Manage secrets safely:
- Store Grafana API keys or instance tokens in your secret manager (GitHub secrets, Vault) and never commit them.
- Use promotion branches or directories:
- Represent environments with branches or folder structures and control promotion with pull requests or Argo CD application promotions.
- Document ownership and SLOs:
- Keep dashboard metadata (owner, intent, data source) in YAML front matter or annotations so reviewers know who to contact for changes.
- Plan for drift and reconciliation:
- Regularly audit the difference between the live UI and Git. Operator-based reconciliation reduces drift by design; CI-driven flows can add a reconciliation step to enforce the Git source of truth.
Notable tooling and features to know about
- Grafana’s “Observability as Code” work includes new versioned resource APIs, a revised dashboard schema, a Foundation SDK for generating dashboards programmatically, and grafanactl (CLI) to manage resources. The Grafana team also introduced Git Sync to connect Grafana to GitHub for UI-driven commits and PRs (experimental/private preview). These developments make multiple GitOps patterns easier to implement.(grafana.com)
- The Grafana Operator remains a popular choice for Kubernetes-native dashboard management and provides CRDs like GrafanaDashboard and GrafanaDataSource to keep dashboards declarative. It supports JSON/URL/Jsonnet sources and handles UID concerns when importing dashboards.(grafana.github.io)
A short checklist before you go live
- Are dashboards stored in Git and reviewed?
- Do CI checks validate JSON/queries and test datasource reachability?
- Are API keys and tokens stored securely outside the repo?
- Is there a promotion path for environments (dev → staging → prod)?
- Do you have a plan to monitor the health of your GitOps sync (Argo CD app status, CI job logs, operator logs)?
Conclusion Bringing dashboards under GitOps brings the same benefits you expect from modern infrastructure workflows: reproducibility, traceability, and safer change management. You can choose a Kubernetes-native operator path (perfect when you already use Argo CD/Flux) or a CI-driven SDK/CLI path (handy when you prefer to keep dashboards with application code). Recent additions from Grafana—versioned APIs, SDKs, grafanactl, and Git Sync—make both patterns more practical and production-ready. Start small, add CI validations, and standardize how dashboards are generated—over time you’ll get more reliable, auditable observability that scales with your team.(grafana.com)