Linking metrics to traces with exemplars: faster latency debugging in Prometheus and Grafana

Aggregated metrics are great for spotting trends — but they’re lousy at telling you which single request caused a spike. Exemplars bridge that gap: they attach a tiny breadcrumb (usually a trace ID and a value) to an aggregated metric point so you can pivot from “latency jumped” straight to the exact trace that produced the outlier. That single link can cut a hunt-for-needle-in-haystack investigation down to minutes instead of hours. (opentelemetry.io)

Why exemplars matter (and when they don’t)

But exemplars aren’t a silver bullet:

What an exemplar looks like (conceptually) An exemplar is a tiny annotation attached to a metric sample: timestamp, the raw observed value, and a set of labels (most importantly a trace ID). In OpenMetrics/Text exposition you’ll see an exemplar attached to a histogram or counter sample; in Grafana it appears as a star/diamond you can click to open the trace. (prometheus.io)

End-to-end: how exemplars flow (high level)

  1. Instrumentation: your app records a measurement while a trace/span is active — the SDK or client library grabs the current trace context and adds it to the metric measurement. OpenTelemetry SDKs can do this automatically when configured. (opentelemetry.io)
  2. Exposition: the library exposes metrics in OpenMetrics format (not the older Prometheus-only format) so exemplars can be represented. Prometheus (>= v2.26.0 behavior) can scrape OpenMetrics text and preserve exemplars when the feature is enabled. (prometheus.io)
  3. Storage/forwarding: the metrics backend (Prometheus, Grafana Alloy/Mimir, Thanos/Cortex variants that implement exemplars) stores exemplars and/or forwards them to a remote store. When using intermediate collectors like Grafana Alloy or remote-write destinations that accept exemplars, you may need to explicitly enable exemplar forwarding. (grafana.com)
  4. UI: Grafana (Prometheus data source) renders exemplars alongside charts and links them to trace backends such as Tempo or Jaeger. Click the exemplar and jump to the trace. (grafana.com)

Concrete configuration notes and examples

h = Histogram(‘request_latency_seconds’, ‘Request latency’) h.observe(0.42, {‘trace_id’: ‘abc123’})

c = Counter(‘requests_total’, ‘Total requests’, [‘method’]) c.labels(‘GET’).inc(exemplar={‘trace_id’: ‘abc123’})

Client library docs note that exemplars are rendered in OpenMetrics and that you must enable exemplar storage server-side to make them visible. ([prometheus.github.io](https://prometheus.github.io/client_python/instrumenting/exemplars/))

- OpenTelemetry approach: many OpenTelemetry SDKs can attach exemplar information automatically if you enable trace-based exemplar filtering in the meter/tracing setup. The .NET example shows setting an exemplar filter so histogram.Record calls include exemplar context when an activity/span is active:
```csharp
var meterProvider = Sdk.CreateMeterProviderBuilder()
    .SetExemplarFilter(ExemplarFilterType.TraceBased)
    .AddOtlpExporter(...)
    .Build();

This lets histogram.Record(…) capture the active trace ID as an exemplar. The OpenTelemetry docs include a hands-on end-to-end example with Prometheus, Jaeger, and Grafana. (opentelemetry.io)

Operational gotchas

Real-world analogy Think of metrics as satellite imagery: you can see there’s a storm (a spike), but you can’t see the individual car that slid off the road. Exemplars are like helicopter footage zooming into that one car — you get the focused context needed to understand what happened and why.

When to add exemplars to your stack

Closing note: maturity and momentum Exemplars are now a practical tool in modern observability stacks — OpenTelemetry provides SDK support, Prometheus and OpenMetrics expose exemplars, and Grafana surfaces them in UIs so you can jump to traces quickly. There are still operational details to manage (sampling, retention, supported metric types and exposition formats), but the payoff — much faster, more precise debugging — is real for teams that instrument thoughtfully. (opentelemetry.io)

Further reading (official docs referenced above)

If your dashboards show a mysterious spike, exemplars are the instrument that lets you listen for the single stray note in the orchestra and follow it straight to the musician — no more guessing, just targeted investigation.