on
Moving Prometheus off the single-node island: remote_write to Grafana Mimir
Prometheus is great for scraping and answering questions about what your application is doing right now, but when teams want reliable, long-term, or global views across clusters they often add a different storage layer. One modern pattern is using Prometheus’s remote_write to push scraped samples into a horizontally scalable backend such as Grafana Mimir, then surfacing those metrics in Grafana dashboards. This article explains the idea, shows a compact config example, and outlines the trade-offs and practical considerations you’ll see when stitching Prometheus and Mimir together.
Why remote_write exists (an analogy)
Think of a Prometheus instance as a talented radio DJ who records local shows (metrics) onto a cassette deck (local TSDB). That’s fast and simple. But if the station network wants every DJ across cities to send shows to a central archive and storefront, a one-off cassette deck doesn’t scale. Prometheus’s remote_write is the tape player-to-archive protocol: it sends the metric samples off-node to a remote receiver that can store, index, and serve queries at scale. The remote_write protocol is a formal Prometheus specification with guarantees around retries and backoff so senders and receivers can interoperate predictably. (prometheus.io)
What Grafana Mimir brings to the table
Grafana Mimir is one of the popular horizontally scalable receivers/backends for Prometheus remote_write flows. It’s designed to accept Prometheus’s remote_write traffic, store large volumes of metrics in object storage, and provide global querying and long-term retention. That lets you keep high-cardinality application metrics for months, federate multiple Prometheus scrapers into a single logical store, and connect Grafana to that centralized view. The Mimir docs and Helm charts show how Prometheus instances can be configured to write to a Mimir endpoint and how Mimir exposes query frontends compatible with PromQL. (grafana.com)
A practical benefit: teams running many short-lived clusters or many Prometheus servers can centralize metrics without rewriting how services expose metrics locally. Prometheus keeps doing what it does best (scraping and local rule evaluation), while Mimir focuses on scale and durability.
A compact Prometheus remote_write example
Here’s what a minimal remote_write block looks like inside prometheus.yml (presented as an example, not an instruction):
remote_write:
- url: "https://mimir.example.com/api/v1/push"
basic_auth:
username: "prometheus"
password: "REDACTED"
queue_config:
max_samples_per_send: 500
batch_send_deadline: 5s
send_exemplars: true
write_relabel_configs:
- source_labels: [__name__]
regex: "kube_.*"
action: keep
Prometheus supports multiple remote_write targets and options like exemplar forwarding, write relabel rules, and queue tuning. Grafana’s agent tooling and Cloud documentation provide concrete configuration patterns when sending to a managed Mimir or a self-hosted cluster. (grafana.com)
How dashboards fit in (Grafana as the stage)
Once metrics live in Mimir, Grafana connects to it as a Prometheus-compatible data source. That lets existing PromQL queries and dashboards work against a global store instead of a local scrape target. Charts that once showed a single node’s CPU can be expanded to show cross-cluster aggregates, retention-aware historical trends, or long-term SLO reporting. Because Mimir exposes query frontends compatible with Prometheus semantics, Grafana visualizations and alerting rules generally need little or no rewriting to point at the new endpoint. (grafana.com)
Trade-offs you should expect
- Complexity and cost: adding a horizontally scalable backend is additional infrastructure — think object storage, ingestion components, and networking. That changes operating footprint and cost profile compared with standalone Prometheus. (grafana.com)
- Query latency and query-sharding considerations: at very large scale, global queries may hit more services in the Mimir stack (ingesters, queriers, query frontends), which adds a different latency pattern than local Prometheus queries. (usenix.org)
- Duplication and deduplication: if multiple Prometheus servers scrape the same targets and all remote_write the same metrics, some backends provide deduplication logic but it’s a thing to design for. (deepwiki.com)
These trade-offs are why many teams use a hybrid approach: keep a local Prometheus for short-term alerting and fast local queries, while remote_write streams a copy of the samples for long-term analytics, historical SLOs, and centralized dashboards.
Practical considerations and patterns
- Agent vs. Prometheus as sender: Grafana Agent can be a lightweight sender that forwards metrics and reduces the full Prometheus server footprint where you only need ingestion and forwarding. The agent’s prometheus.remote_write component has options to send to a local or managed Mimir. (grafana.com)
- Backoff, retry, and reliability: the remote_write spec requires senders to implement retry and backoff, so temporary network blips don’t cause systemic loss. Monitoring the sender queue sizes and “samples dropped” metrics is important for understanding when ingestion is strained. (prometheus.io)
- Cost-aware pruning and relabeling: write_relabel_configs let you drop high-cardinality labels or select only important metric families before they leave the scrape host, which helps control storage and egress costs. (grafana.com)
- Single pane for multi-cluster: sending many clusters’ Prometheus data into a central Mimir gives a “global” view and simpler alert correlation, but teams should adopt naming and label strategies so dashboards stay readable and queryable. (grafana.com)
The balanced verdict
Prometheus + remote_write + Mimir is a pragmatic architectural pattern when scale, retention, and multi-cluster visibility matter. It offloads long-term retention and global querying to a system built for distributed scale, while letting Prometheus remain the reliable local scraper. That said, the added operational complexity, cost profile, and query characteristics mean it’s not always the right choice for very small deployments or single-node setups where the local TSDB suffices. Grafana’s ecosystem (Agent, Mimir, Cloud) and the Prometheus remote_write specification have matured a lot, making the pattern reliable and interoperable for production use. (prometheus.io)
Closing note
If your monitoring needs include long-term trend analysis, multi-cluster aggregation, or centralized historical SLOs, the remote_write → Mimir pattern is a widely adopted way to achieve those goals while preserving the familiar Prometheus + Grafana query and dashboard workflow. The ecosystem provides multiple sender/receiver options and configuration knobs so teams can tune the balance of performance, cost, and operational complexity. (grafana.com)