on
Bridging Prometheus and OpenTelemetry: practical patterns for scalable metrics and Grafana dashboards
Prometheus and Grafana are often the heart of application monitoring, while OpenTelemetry is becoming the lingua franca for instrumenting services. Treating the combination as a band: Prometheus keeps the beat (time-series storage and query), OpenTelemetry writes the sheet music (rich, language-neutral telemetry), and Grafana conducts the ensemble (dashboards and SLOs). This article walks through the common, recent pattern of using the OpenTelemetry Collector as a bridge between application instrumentation and Prometheus + Grafana, with a focus on scaling and long-term storage using Prometheus remote_write (for example, Grafana Mimir). The goal is an explanation of why the pattern exists, how the pieces map together, and practical config snippets that illustrate the flow.
Why bridge OpenTelemetry and Prometheus?
- OpenTelemetry SDKs instrument code with a standard API and can emit OTLP, which works across languages and vendors. The Collector can accept that telemetry and translate/export it in ways that fit existing ecosystems. (uptrace.dev)
- Prometheus is optimized for high-cardinality, real-time scraping and expressive PromQL, but many modern services already speak OTLP rather than exposing Prometheus-format endpoints. Making them work together reduces duplication and centralizes telemetry pipelines. (opentelemetry.io)
Two common bridging patterns
- Collector exposes a Prometheus scrape endpoint (pull model).
- The OpenTelemetry Collector can host a Prometheus exporter endpoint that exposes translated metrics in the Prometheus text format; Prometheus scrapes that endpoint like any other target. This pattern fits teams that want to keep Prometheus as the collector of record for short-term, high-resolution storage while centralizing instrumentation in OTLP. (opentelemetry.io)
- Collector (or Prometheus) pushes to long-term storage via remote_write (push model).
- Prometheus has a native remote_write mechanism that forwards samples to compatible backends. Grafana Mimir and other long-term stores implement the same receiver API so Prometheus (or agents) can push metrics for long retention, multi-tenancy, and horizontal scale. The remote-write protocol continues to evolve (Remote-Write 2.0 adds richer metadata and semantics). (prometheus.io)
How the Collector fits in (practical roles)
- Prometheus-exporter role: Collector converts OTLP to Prometheus-format metrics and exposes an endpoint for Prometheus to scrape. This makes services instrumented with OpenTelemetry appear as standard Prometheus targets.
- Remote-write forwarder: Collector or Prometheus itself can push samples to a remote_write backend (e.g., Grafana Mimir) for long-term storage and multi-tenant querying.
- Transformation and aggregation: The Collector can filter, normalize names, or roll up high-cardinality labels before metrics are persisted downstream—useful for managing cardinality and disk usage. (uptrace.dev)
Concrete config examples (illustrative)
- OpenTelemetry Collector: expose Prometheus-format metrics for scraping ```yaml receivers: otlp: protocols: grpc: http:
exporters: prometheus: endpoint: “0.0.0.0:9464” # Prometheus will scrape this
service: pipelines: metrics: receivers: [otlp] exporters: [prometheus]
This makes the Collector the scrape target at port 9464; Prometheus scrape jobs point at the Collector pods/services. The OpenTelemetry Prometheus exporter follows the text-format conventions and can be configured to translate OTEL names to Prometheus naming when needed. ([opentelemetry.io](https://opentelemetry.io/docs/specs/otel/metrics/sdk_exporters/prometheus/?utm_source=openai))
- Prometheus: remote_write to Grafana Mimir (long-term store)
```yaml
remote_write:
- url: "http://mimir:9009/api/v1/push"
# optional: bearer_token_file, queue_config, send_exemplars, send_native_histograms
Grafana Mimir exposes a push receiver at POST /api/v1/push, making it compatible with Prometheus remote_write clients such as Prometheus itself, the Grafana Agent, or other shippers. (grafana.com)
Notes on the important trade-offs
- Scrape vs push:
- Scraping keeps the Prometheus model intact (service discovery driven). It’s simple for many Kubernetes setups but requires Prometheus to scale horizontally if you need wide coverage.
- Pushing (remote_write) centralizes metrics and enables long-term storage and multi-tenant querying, but introduces transient network/queueing concerns. The remote_write protocol and clients implement retries and backoff policies to handle spikes. (prometheus.io)
- Metric semantics and naming:
- OpenTelemetry and Prometheus differ in naming conventions and histogram semantics. The OpenTelemetry Prometheus Exporter implements translations but be mindful of how histograms and exemplars are represented when they cross boundaries. This affects accuracy of latency SLOs and aggregated metrics. (opentelemetry.io)
- Cardinality and cost:
- Translating telemetry into Prometheus series can explode series counts if service labels or high-cardinality attributes aren’t normalized. Downstream stores like Mimir scale horizontally to handle many series, but storage and query cost rise accordingly. Use label filtering, relabeling, and aggregation early in the pipeline to keep series manageable. (ewere.tech)
SLOs and dashboards in Grafana: why long-term metrics matter
- Service Level Objectives usually require longer windows (weeks to months) for meaningful measurement. Short-term Prometheus retention (days) can make historical SLO analysis impossible unless metrics are archived to long-term storage. Grafana’s SLO tooling integrates with Prometheus-compatible backends, enabling SLO calculation and alerting over long windows when metrics are persisted in stores like Grafana Mimir. (grafana.com)
Operational realities and recent developments
- The OpenTelemetry ecosystem keeps tightening integration points: Collector components for Prometheus formats, exporters, and remote-write adapters have improved stability and feature parity, making the bridge pattern broadly practical for many teams migrating toward OTLP-first instrumentation. (uptrace.dev)
- The Prometheus remote_write spec has seen updates to add richer semantics for remote receivers and to clarify retry and backoff behavior. Receivers like Mimir implement these patterns to offer reliable ingestion at scale. The community continues to iterate on partial-write handling and headers that communicate ingestion status. (prometheus.io)
Practical debugging pointers (conceptual)
- Exemplar visibility: When histograms and exemplars cross exporters, check whether exemplars are preserved end-to-end (important for distributed traces linking to Prometheus samples).
- Missing series: If expected metrics aren’t appearing in Prometheus or long-term storage, inspect Collector logs for export errors, Prometheus scrape logs for 4xx/5xx responses, and remote_write queue metrics (they expose useful internal counters).
- Start with low retention and narrow scrape windows in Prometheus while centralizing long retention in a remote store—this hybrid approach is common for balancing fast queries against cost.
Closing thoughts Instrumenting with OpenTelemetry and keeping Prometheus + Grafana for querying and dashboards can combine the best of both worlds: standardized, language-agnostic instrumentation with a powerful query engine and visualization layer. The Collector-as-bridge pattern allows teams to centralize telemetry pipelines, handle transformations, and forward metrics to scalable long-term stores (e.g., Grafana Mimir) via the well-established Prometheus remote_write path. The design choices center on scrape versus push, label management, histogram semantics, and the desired retention/query performance trade-offs—each of which affects dashboard fidelity and SLO calculations. (uptrace.dev)
References
- OpenTelemetry Metrics Prometheus exporter documentation. (opentelemetry.io)
- Guide for integrating Prometheus with OpenTelemetry Collector. (uptrace.dev)
- Prometheus guide for sending OpenTelemetry metrics to Prometheus. (prometheus.io)
- Grafana Mimir documentation (remote_write / push receiver). (grafana.com)
- Grafana SLO and best-practices docs. (grafana.com)