Written by Albert Friedman
on September 17, 2025

Observability in 2025: Tracing, Telemetry, and Reliability Metrics that Actually Help

If you’re standing up (or modernizing) observability this year, you’re doing it in a world where OpenTelemetry (OTel) is the lingua franca, cloud providers accept OTLP directly, and AI/LLM features are shipping into production. That’s great news—and a lot to take in. This article gives you a clear, practical path: how to wire tracing and telemetry you can trust, how to connect metrics to traces (so you can jump from a red chart to a single problematic request), and how to define reliability metrics that match user experience—including for AI workloads.

What’s changed recently that makes this worth revisiting now?

OTLP (the OpenTelemetry Protocol) is stable for traces, metrics, and logs; vendors and platforms now accept it directly, removing piles of custom agents. (opentelemetry.netlify.app)
Google Cloud’s Ops Agent can ingest OTLP traces and metrics (gRPC) out of the box, simplifying deployment on GCE and GKE. (cloud.google.com)
OpenTelemetry’s Generative AI semantic conventions are maturing; they define common names for spans, metrics, and events like token usage and time-per-output-token. As of September 2025 these remain in “Development” status but are actively used. (opentelemetry.io)
Frameworks for LLM apps are meeting OTel halfway: for example, LangSmith (from LangChain) added end-to-end OTel support in March 2025, so you can standardize tracing across “classic” services and agentic workflows. (changelog.langchain.com)

Below, we’ll combine these threads into a crisp workflow you can adopt in weeks, not months.

The new baseline: OpenTelemetry + OTLP

OpenTelemetry gives you SDKs, semantic conventions, and a vendor-neutral collector. The crucial piece is OTLP—one protocol for traces, metrics, and logs. The protocol’s 1.7.0 spec documents stable signals and transport over gRPC and HTTP, so you can standardize instrumentation and switch backends without rewriting exporters. (opentelemetry.io)

On managed platforms, you no longer need bespoke agents per signal. For example, Google Cloud’s Ops Agent can receive OTLP and route your traces to Cloud Trace and metrics to Cloud Monitoring or Managed Service for Prometheus, with a single config. That reduces friction and mistakes when you ship. (cloud.google.com)

Tip: name things before you ship. Set OTEL_SERVICE_NAME (or explicit resources) so telemetry is grouped logically; otherwise you’ll end up with “unknown_service” everywhere. (opentelemetry.io)

Tracing that matters: head vs. tail sampling (and why you need both)

Head-based sampling is efficient and simple. You make a probabilistic decision at the start of a trace, so you don’t pay to record every span. The downside: you can’t guarantee “keep all errors” because you don’t know the outcome yet. (opentelemetry.netlify.app)
Tail-based sampling decides after a trace finishes, letting you keep “the interesting ones” (errors, slow requests, specific attributes) while dropping the rest. In OTel, you usually implement this in the Collector with the tail_sampling processor. Policies include status_code, latency thresholds, attribute matches, and more. (github.com)

A simple collector snippet (YAML) illustrates the idea:

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: keep_errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow_traces
        type: latency
        latency:
          threshold_ms: 5000

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp] # to your backend

Start with head-based sampling (e.g., 5–10%) in the SDK to cap volume, then add tail sampling in the Collector to elevate the “must keep” traces. That hybrid pattern balances cost and fidelity. (opentelemetry.netlify.app)

Link charts to traces with exemplars

If you’ve ever stared at a latency heatmap and thought, “I just want to see one of those spikes,” exemplars are for you. An exemplar attaches trace context (trace_id, span_id) to a metric data point. With exemplars enabled in your metrics backend, you can click from a bucket to the exact trace that generated it. (opentelemetry.io)

Prometheus supports exemplars in the OpenMetrics format. In client libraries like Python, you can attach trace_id when observing. Don’t forget to enable Prometheus’ exemplar storage feature. (prometheus.github.io)
On Google Cloud’s Managed Service for Prometheus, exemplars on histograms integrate with Cloud Trace; retention is currently up to 24 months. (cloud.google.com)

Here’s a minimal Python example that records a histogram with a trace-linked exemplar using Prometheus client and OpenTelemetry:

from time import sleep, perf_counter
from prometheus_client import Histogram, start_http_server
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, OTLPSpanExporter

# Set up tracing
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
)
tracer = trace.get_tracer(__name__)

# Metrics
latency = Histogram("request_latency_seconds", "Request latency (s)")
start_http_server(8000)  # expose /metrics

def trace_id_hex():
    ctx = trace.get_current_span().get_span_context()
    return f"{ctx.trace_id:032x}" if ctx.trace_id else None

while True:
    with tracer.start_as_current_span("handle_request"):
        t0 = perf_counter()
        sleep(0.05)  # do work
        dt = perf_counter() - t0
        tid = trace_id_hex()
        if tid:
            latency.observe(dt, {"trace_id": tid})  # exemplar with trace_id
        else:
            latency.observe(dt)

This lets you click from a latency chart bucket straight into the corresponding trace when your backend supports exemplars. (prometheus.github.io)

Telemetry for LLMs and agents (yes, it’s different)

Traditional web SLIs (latency, error rate) still matter for AI features, but you also care about:

Token usage and cost per request/feature path.
Time-to-first-token and time-per-output-token (decode throughput).
Safety/guardrail outcomes and tool/agent steps.

OpenTelemetry’s Generative AI semantic conventions define standard names for these signals. For example:

gen_ai.client.token.usage to record input/output token counts on the client side.
gen_ai.server.request.duration and gen_ai.server.time_per_output_token on model servers.

As of September 2025, these conventions are in “Development” status with an opt-in mechanism while they stabilize, but they already provide a common shape for AI telemetry. (opentelemetry.io)

Ecosystem support is arriving: LangSmith added end-to-end OTel support in March 2025 for LangChain/LangGraph apps, so you can emit standardized traces and correlate them with system telemetry. If you’re instrumenting agents and tool calls, this makes end-to-end trace context realistic instead of wishful. (changelog.langchain.com)

Reliability metrics that align with users

SRE’s reliability model is still the best starting point:

SLIs describe what users experience: success rate, latency percentiles, freshness, or “good user minutes.”
SLOs set the target.
Error budgets (1 – SLO) buy you change velocity until you exceed them; when you do, you slow or halt risky changes. (sre.google)

When picking SLIs, start with the Four Golden Signals—latency, traffic, errors, saturation—and tailor them to your service and dependency calls. These are simple to explain, easy to measure, and catch most regressions before customers do. (sre.google)

For AI features, extend SLIs to include:

Quality or safety pass rate (e.g., percentage of responses that pass your evaluator or policy checks).
Token-per-request and cost-per-request distributions (p50/p95).
Time-to-first-token and time-per-output-token (user-perceived responsiveness).

You can implement these with the GenAI metrics conventions (plus your own evaluations), then define realistic SLO targets and track budgets weekly.

A 30-day rollout plan

Week 1: Name and ship the basics

Set OTEL_SERVICE_NAME via env or resource config everywhere.
Enable OTLP exporters/receivers (SDKs → Collector → backend). On Google Cloud, consider using the Ops Agent’s OTLP receiver to route to Cloud Trace and Monitoring. (opentelemetry.io)

Week 2: Correlate metrics and traces

Add one latency histogram per key API and attach exemplars with trace_id.
Enable exemplar storage in Prometheus (or use Managed Service for Prometheus + Cloud Trace). (prometheus.github.io)

Week 3: Control volume without losing the “spicy” traces

Set head-based sampling in SDKs (e.g., traceidratio ~0.05).
Add tail_sampling in the Collector to always keep errors and very slow traces. Start simple: status_code + latency policies. (opentelemetry.netlify.app)

Week 4: Define SLOs and protect privacy

Choose two SLIs to start: success ratio and 99th percentile latency over a four-week window. Write an error budget policy (“freeze risky changes if we burn the quarter’s budget”). (sre.google)
If you capture prompts/responses or user identifiers, use the Collector’s transform/filter processors (OTTL) to mask or drop sensitive fields before export. Document transformations and test them. (opentelemetry.io)

Optional (AI features):

Emit gen_ai.client.token.usage and (if you run models) gen_ai.server.* latency metrics.
If you use LangChain/LangGraph, turn on LangSmith’s OTel integration to unify traces across app logic and infrastructure. (opentelemetry.io)

Common pitfalls (and how to avoid them)

Missing service names. Everything becomes “unknown_service,” and you can’t aggregate or alert cleanly. Set OTEL_SERVICE_NAME early. (opentelemetry.io)
Sampling without a plan. If you only use head-based sampling, you’ll miss rare failures. Add tail-sampling policies for errors and high latency; keep the policy list short to start. (opentelemetry.netlify.app)
Unclickable charts. Without exemplars, you’ll context-switch from charts to trace search. Add exemplars to histograms where you care about “click-through to trace.” (opentelemetry.io)
PII in telemetry. Scrub at the edge using the transform/filter processors; don’t depend on downstream redaction. (opentelemetry.io)
AI telemetry soup. Use the GenAI semantic conventions so token usage, latency, and agent spans look the same across services and teams, even if the model/provider changes. Track the stability status and pin versions during rollout. (opentelemetry.io)

The bottom line

Modern observability is less about buying another tool and more about getting the workflow right: name your services, standardize on OTLP, sample smartly, connect metrics to traces with exemplars, and hold yourself to SLOs that reflect real user experience. If you’re shipping AI features, add token/cost/throughput metrics and instrument agent steps with the emerging GenAI conventions. The good news: you don’t have to guess anymore—there’s a clear path, and most of the heavy lifting is built into OpenTelemetry and today’s cloud platforms. (opentelemetry.io)

Happy instrumenting—and may your error budget stay unspent this quarter.

← → Top