Zero‑Code Tracing with OpenTelemetry eBPF: From First Trace to RED Metrics You Can Trust

Observability teams have spent years wrestling with “agent spaghetti,” manual code changes, and uneven trace coverage. Over the last few months, the OpenTelemetry (OTel) community has quietly unlocked a different path: zero‑code, kernel‑level auto‑instrumentation for distributed tracing and metrics, powered by eBPF. With OpenTelemetry eBPF Instrumentation (OBI) maturing, a beta of Go auto‑instrumentation via eBPF, and semantic conventions stabilization work landing for core protocols, it’s a great time to revisit how we get from first trace to reliable RED metrics without rewiring every service. (opentelemetry.io)

This article walks a pragmatic path:

Why OBI is a big deal

OBI attaches safe, purpose‑built eBPF programs to your Linux nodes and processes to observe HTTP and gRPC calls, database activity, and network flows—without touching application code. Highlights:

OBI runs on modern Linux kernels (5.8+ or 4.18 for RHEL) and requires either privileged mode or a set of fine‑grained capabilities. Check kernel and privileges early in your rollout plan. (opentelemetry.io)

Momentum is real: recent contributions extended OBI’s automatic trace generation and eased deployment via a DaemonSet or Helm, further lowering the barrier to entry for orgs that haven’t instrumented legacy services. (globenewswire.com)

In parallel, the OpenTelemetry project announced a beta for Go eBPF auto‑instrumentation—another signal that zero‑code tracing is moving into the mainstream across ecosystems. (opentelemetry.io)

A minimal Kubernetes path: OBI → Collector → your backend

There are two common patterns to deploy OBI in Kubernetes:

Below is a trimmed sidecar example to get traces and metrics into the OpenTelemetry Collector. It also turns on Kubernetes metadata decoration so your telemetry lines up with your service maps.

YAML (excerpt):

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels: { app: demo }
  template:
    metadata:
      labels: { app: demo }
    spec:
      shareProcessNamespace: true
      serviceAccountName: obi
      containers:
        - name: app
          image: yourorg/demo:latest
          ports: [{ containerPort: 8080, name: http }]
        - name: obi
          image: otel/ebpf-instrument:latest
          securityContext:
            privileged: true
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: http://otelcol:4318
            - name: OTEL_EBPF_OPEN_PORT
              value: "8080"
            - name: OTEL_EBPF_KUBE_METADATA_ENABLE
              value: "true"

The OBI docs include the RBAC setup and a complete example manifest; use those as your baseline. (opentelemetry.io)

Turn traces into RED metrics with the spanmetrics connector

You can derive the RED trio—Rate, Errors, Duration—from your traces using the spanmetrics connector in the Collector. It transforms spans into request counters, error ratios, and latency histograms with dimensions you choose (service, route, status code), and it can attach exemplars that link directly to traces.

Collector config (core parts only):

receivers:
  otlp:
    protocols: { grpc: {}, http: {} }

connectors:
  spanmetrics:
    dimensions:
      - name: service.name
      - name: http.method
      - name: http.route
      - name: http.status_code
    histogram:
      explicit:
        buckets: [100us, 1ms, 5ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s]
    exemplars:
      enabled: true

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    enable_open_metrics: true  # required for exemplars
  # optionally also export traces:
  # otlp: { endpoint: tempo:4317, tls: { insecure: true } }

processors:
  batch: {}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [spanmetrics]  # feed metrics generation
    metrics:
      receivers: [spanmetrics]
      exporters: [prometheus]

Note: If you’re scraping /metrics and don’t see exemplars, check that OpenMetrics is enabled and your backend supports exemplars. Some setups and versions had gaps; Prometheus Remote Write or managed Prometheus backends (e.g., Google Cloud’s Managed Service for Prometheus) support exemplars well. (docs.openshift.com)

Exemplars are individual sample points stored alongside aggregate metrics (typically histograms) that carry trace and span IDs. In practice, that means you can click from a latency spike directly to a representative trace. SDKs support exemplar filters like AlwaysOn and TraceBased; the latter only adds exemplars when a sampled span is active. (opentelemetry.io)

Tip for Java users: the Prometheus Java client integrates with the OTel Java agent and marks spans used as exemplars with exemplar=”true”. You can then write a tail‑sampling rule to always keep those traces so your exemplar links don’t point to a trace that was dropped. (prometheus.github.io)

Keep costs in check with tail sampling

Head‑based sampling (in the SDK) is cheap but blind to outcomes. Tail‑based sampling (in the Collector) decides after a trace completes, so you can keep the good stuff—errors, slow requests, or exemplar‑flagged traces—while downsampling the rest. (opentelemetry.io)

Example policies:

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: keep-exemplars
        type: string_attribute
        string_attribute: { key: "exemplar", values: ["true"] }
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow-traces
        type: latency
        latency: { threshold_ms: 1000 }
      - name: baseline
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

This pattern keeps exemplar traces and all errors, adds slow traces over one second, and retains a 5% baseline for discovery. Tweak thresholds per service tier. (prometheus.github.io)

Semantic conventions are settling—plan your migration

Dashboards and alerts depend on stable attribute names. The OTel community has been working through stabilization projects so the ecosystem can standardize on durable names that map well to RED/SLO workflows.

What OBI won’t do (and how to fill the gaps)

A simple rollout checklist

Bottom line

eBPF‑powered, zero‑code tracing means you can light up meaningful coverage across polyglot and legacy services in hours, not weeks. Pair OBI with the spanmetrics connector and exemplars, and you have a clean path from first trace to actionable RED metrics and SLOs—while tail sampling keeps budgets sane. With RPC conventions stabilizing and language‑level eBPF advances rolling in, this approach isn’t a stopgap; it’s becoming the new default on‑ramp to reliable, vendor‑neutral observability. (globenewswire.com)

If you want a hand pressure‑testing an initial config (Collector YAML, buckets, sampling policies), share a sketch of your stack and traffic shape—I’m happy to suggest a starting point.