Right-size without restarts: Cutting Kubernetes costs with in‑place Pod resize

Kubernetes finally has a practical way to change CPU and memory on running Pods without deleting them. The in‑place Pod resize feature (also called in‑place vertical scaling) graduated to beta in Kubernetes v1.33 and is enabled by default. That unlocks a new, low‑friction path to reduce over‑provisioning—and your bill—while keeping services online. (kubernetes.io)

Why this matters for cost optimization

Until recently, if you wanted to right‑size a container’s resources you had to recreate the Pod. Vertical Pod Autoscaler (VPA) could do that automatically, but the evictions were disruptive enough that many teams avoided it in production, sticking with generous requests “just to be safe.” In‑place resize removes that barrier for many workloads. VPA integration is still evolving upstream, but managed platforms are starting to add options that attempt in‑place updates first (for example, GKE’s preview “InPlaceOrRecreate” mode). Expect fewer restarts and a better chance to reduce requests during quiet periods—freeing capacity for bin‑packing and node scale‑down. (kubernetes.io)

What actually changed

At a high level:

You can also specify a per‑resource resize policy. For example, apply CPU changes without a restart but require a restart for memory (handy because many runtimes/apps can’t shrink memory safely without restarting). (kubernetes.io)

A safe, repeatable workflow to lower costs

Here’s a pragmatic sequence you can adopt team‑wide. It works whether you manage nodes yourself or use a provider’s autoscaler.

1) Measure real usage and current efficiency

Example (after installing OpenCost and kubectl‑cost):

kubectl cost namespace \
  --window 7d --show-cpu --show-memory --show-efficiency=true

2) Choose conservative targets

3) Add explicit resize policies

Example snippet:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      containers:
      - name: api
        image: yourrepo/api:sha256
        resources:
          requests:
            cpu: "400m"
            memory: "512Mi"
          limits:
            cpu: "600m"
            memory: "512Mi"
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: RestartContainer

The policy behavior is defined by Kubernetes and helps you control disruption during resizes. (kubernetes.io)

4) Roll out and resize in place

Examples:

# Increase CPU (no restart expected)
kubectl patch pod api-7c9f... --subresource=resize --patch \
  '{"spec":{"containers":[{"name":"api","resources":{"requests":{"cpu":"500m"},"limits":{"cpu":"700m"}}}]}}'

# Increase memory (will restart this container due to policy)
kubectl patch pod api-7c9f... --subresource=resize --patch \
  '{"spec":{"containers":[{"name":"api","resources":{"requests":{"memory":"768Mi"},"limits":{"memory":"768Mi"}}}]}}'

Require v1.33+ cluster and kubectl v1.32+ for –subresource=resize. (kubernetes.io)

5) Watch Pod resize status and SLOs

6) Let the autoscaler do the savings

7) Verify the outcome in dollars

Design notes and guardrails

A reference playbook you can copy

Putting it together: in‑place resize + consolidation + visibility

The most reliable way to turn rightsizing into real savings is to combine three capabilities:

When you put these together, two good things happen: you stop paying for idle headroom, and you’re not afraid to keep iterating because the operational risk is controlled.

Quick start checklist

Bottom line: in‑place Pod resize makes continuous rightsizing realistic. Start with low‑risk services, prove the impact with OpenCost, and let your autoscaler consolidate the rest. The result is a leaner cluster and a smaller bill—without the drama of rolling restarts.