Stress-Testing Kubernetes: Proving “Consistent Reads from Cache” Really Works

If you’ve ever stress‑tested a busy Kubernetes control plane, you know LIST calls can become the equivalent of a Friday afternoon traffic jam: everything backs up, and latency spikes right when you least want it. Kubernetes 1.31 quietly delivered a big improvement here: “consistent reads from cache.” In plain English, the API server can now serve strongly consistent GET/LIST requests straight from its watch cache when it’s fresh enough, instead of always pounding etcd. That change reduces etcd load and slashes tail latencies under pressure. In the project’s own 5,000‑node tests, enabling it cut kube‑apiserver CPU ~30%, etcd CPU ~25%, and reduced p99 pod LIST latency up to 3×. (kubernetes.io)

Below is a practical, reproducible way to validate that improvement in your own environment and to fold it into your stress‑testing toolkit.

What changed (and why you should care)

Under the hood, the feature is guarded by the ConsistentListFromCache feature gate on kube‑apiserver, which lets you toggle it for before/after testing. (github.com)

What to measure during stress

When you push the control plane, track:

Example PromQL to track pod LIST latency:

histogram_quantile(
  0.99,
  sum by (le) (
    rate(apiserver_request_duration_seconds_bucket{verb="LIST",resource="pods"}[5m])
  )
)

And to see how many LISTs are served from cache vs storage:

sum(rate(apiserver_cache_list_total{resource_prefix="/pods"}[5m]))
sum(rate(apiserver_storage_list_total{resource="/pods"}[5m]))

These pair nicely in a dashboard during a stress run. (cloud.google.com)

A minimal, reproducible “before/after” test

You can reproduce the impact locally with kind by toggling the feature gate.

1) Create a cluster with the feature OFF (the “before” baseline)

Use a kind config patch that sets the kube‑apiserver flag via kubeadm extraArgs:

# kind-config-before.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        feature-gates: "ConsistentListFromCache=false"

Create the cluster:

kind create cluster --config kind-config-before.yaml

This relies on kubeadm’s ability to pass component flags through extraArgs, which kind exposes via patches. (kubernetes.io)

2) Generate read-heavy load

A lightweight approach is to drive API traffic through kubectl proxy and a k6 script that issues filtered LISTs (which previously forced etcd work).

kubectl proxy --port=8001 &
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  vus: 100,      // adjust to taste
  duration: '2m'
};

export default function () {
  // Filtered LIST that usually returns few/zero items; great for stressing the old path
  const url = 'http://127.0.0.1:8001/api/v1/pods?labelSelector=madeup%3Dfalse';
  http.get(url);
  sleep(1);
}
k6 run list-pods.js

If you prefer running inside the cluster and orchestrating tests, Testkube has native k6 support; for large‑scale or mixed workloads, kube‑burner provides battle‑tested scenarios and metrics collection. (docs.testkube.io)

3) Capture metrics

During the run, scrape the API server’s /metrics and record:

4) Recreate the cluster with the feature ON (the “after” run)

Delete the cluster and create a new one with the default behavior (or explicitly true):

# kind-config-after.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
      extraArgs:
        feature-gates: "ConsistentListFromCache=true"

Repeat the same k6 run and compare dashboards. You should see p99 LIST latency drop and the cache‑served counter jump. (github.com)

Real‑world stress options

Gotchas and tips

Where this is headed

Kubernetes 1.32 introduced “watch list” (API streaming for list), which streams the initial list from the watch cache item‑by‑item with constant memory overhead—another win for stability during big LISTs. If you want to keep squeezing latency and memory under stress, consider enabling and testing that path as well. (kubernetes.io)

The takeaway

“Consistent reads from cache” isn’t just a nice‑to‑have; it’s a meaningful shift in how the API server handles read pressure. Under stress, it offloads etcd, tames tail latencies, and makes request costs more predictable. A quick before/after run with the feature gate will show you the difference in your own environment—and give you confidence that the next time traffic spikes, your control plane won’t be the bottleneck. (kubernetes.io)

Resources to keep handy:

Happy testing—and may your p99s trend down and to the right.