on
Right-size without restarts: Cutting Kubernetes costs with in‑place Pod resize
Kubernetes finally has a practical way to change CPU and memory on running Pods without deleting them. The in‑place Pod resize feature (also called in‑place vertical scaling) graduated to beta in Kubernetes v1.33 and is enabled by default. That unlocks a new, low‑friction path to reduce over‑provisioning—and your bill—while keeping services online. (kubernetes.io)
Why this matters for cost optimization
Until recently, if you wanted to right‑size a container’s resources you had to recreate the Pod. Vertical Pod Autoscaler (VPA) could do that automatically, but the evictions were disruptive enough that many teams avoided it in production, sticking with generous requests “just to be safe.” In‑place resize removes that barrier for many workloads. VPA integration is still evolving upstream, but managed platforms are starting to add options that attempt in‑place updates first (for example, GKE’s preview “InPlaceOrRecreate” mode). Expect fewer restarts and a better chance to reduce requests during quiet periods—freeing capacity for bin‑packing and node scale‑down. (kubernetes.io)
What actually changed
At a high level:
- spec.containers[*].resources now represents the desired CPU/memory and can be updated.
- status.containerStatuses[*].resources shows what the node is actually enforcing.
- You request a resize by updating the Pod via the new resize subresource.
- The feature is beta and on by default in v1.33; you’ll need kubectl v1.32+ to use –subresource=resize. (kubernetes.io)
You can also specify a per‑resource resize policy. For example, apply CPU changes without a restart but require a restart for memory (handy because many runtimes/apps can’t shrink memory safely without restarting). (kubernetes.io)
A safe, repeatable workflow to lower costs
Here’s a pragmatic sequence you can adopt team‑wide. It works whether you manage nodes yourself or use a provider’s autoscaler.
1) Measure real usage and current efficiency
- Grab rolling CPU/memory usage from your observability stack (Prometheus/Grafana, Datadog, etc.).
- For cost‑focused visibility, deploy OpenCost (Helm install) and use the kubectl‑cost plugin to see per‑namespace and per‑workload “cost efficiency” and idle spend. This will immediately highlight the biggest waste. (opencost.io)
Example (after installing OpenCost and kubectl‑cost):
kubectl cost namespace \
--window 7d --show-cpu --show-memory --show-efficiency=true
2) Choose conservative targets
- For each deployment, pick a “baseline” request that covers typical steady‑state plus a buffer. Keep existing limits initially.
- For memory, be extra conservative (spikes, caches, and GC behavior make downsizing riskier).
3) Add explicit resize policies
- Set CPU to NotRequired and memory to RestartContainer. That lets you downsize CPU without restarts and treat memory changes more cautiously.
Example snippet:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
image: yourrepo/api:sha256
resources:
requests:
cpu: "400m"
memory: "512Mi"
limits:
cpu: "600m"
memory: "512Mi"
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: RestartContainer
The policy behavior is defined by Kubernetes and helps you control disruption during resizes. (kubernetes.io)
4) Roll out and resize in place
- Upgrade the deployment with the new baseline. Then use the resize subresource for live tweaks during a low‑risk window.
Examples:
# Increase CPU (no restart expected)
kubectl patch pod api-7c9f... --subresource=resize --patch \
'{"spec":{"containers":[{"name":"api","resources":{"requests":{"cpu":"500m"},"limits":{"cpu":"700m"}}}]}}'
# Increase memory (will restart this container due to policy)
kubectl patch pod api-7c9f... --subresource=resize --patch \
'{"spec":{"containers":[{"name":"api","resources":{"requests":{"memory":"768Mi"},"limits":{"memory":"768Mi"}}}]}}'
Require v1.33+ cluster and kubectl v1.32+ for –subresource=resize. (kubernetes.io)
5) Watch Pod resize status and SLOs
- Kubernetes adds conditions like PodResizeInProgress or PodResizePending:Deferred/Infeasible so you can see what’s happening. Keep an eye on error messages and your service SLOs while resizes roll out. (kubernetes.io)
6) Let the autoscaler do the savings
- After requests drop across enough Pods, nodes will go under‑utilized. If you’re on AWS, Karpenter’s consolidation can proactively evict and drain nodes, repacking workloads onto fewer, cheaper instances. This is where rightsizing translates into actual dollars saved. Tune the consolidation policy and consolidateAfter so this happens promptly but safely. (karpenter.sh)
7) Verify the outcome in dollars
- Use OpenCost to compare “before vs. after” for namespaces and top workloads (CPU/memory efficiency and total cost) over the last 7–14 days. OpenCost provides standardized cost allocation APIs and dashboards you can script into CI or a weekly report. The project is now a CNCF Incubating project, which helps teams adopt it as a common language for Kubernetes cost. (opencost.io)
Design notes and guardrails
- Supported resources: Only CPU and memory can be resized in place. Windows Pods aren’t supported. Init and ephemeral containers can’t be resized. Quality of Service (QoS) class won’t change; for example, Guaranteed Pods must keep requests equal to limits. (kubernetes.io)
- Memory downsizing: Kubernetes makes a best‑effort attempt when you decrease memory without restart, but if usage is already above the new request/limit it will defer the change to avoid OOM. In practice, treat memory decreases as a restart event unless you’ve validated behavior. (kubernetes.io)
- Feature availability: In‑place resize is beta and on by default starting with v1.33. Some managed services add provider‑specific options; for example, GKE offers a preview VPA mode that attempts in‑place resizes first on certain versions. Always check your provider’s release notes. (kubernetes.io)
- Interplay with HPA: Horizontal Pod Autoscaler still scales replicas based on metrics (CPU %, custom metrics). Rightsizing improves bin‑packing so HPA scale‑downs are more likely to free whole nodes for removal.
- Pod Disruption Budgets (PDBs): They don’t block in‑place CPU resizes, but they will impact any operation that restarts Pods (like memory changes with RestartContainer). Set budgets that balance safety with the desire to consolidate nodes.
- Observe the scheduler: When resizes are pending, the scheduler considers the max of desired, allocated, and actual requests—so you won’t accidentally oversubscribe during a transition. (kubernetes.io)
A reference playbook you can copy
- Baseline: Use 7–14 days of usage to pick CPU/memory requests with headroom.
- Policies: Configure resizePolicy: CPU NotRequired, memory RestartContainer.
- Automate: Add a rightsizing job that proposes updates daily from your usage data; a human reviews and applies resizes via GitOps or a runbook.
- Batch the changes: Start with low‑risk stateless services, then apply to stateful sets with strong PDBs and rollout strategies.
- Free the nodes: Ensure your cluster autoscaler (or Karpenter) is set to consolidate quickly after utilization falls; tune disruption settings to avoid churn during traffic spikes. (karpenter.sh)
- Prove the savings: Compare OpenCost allocation reports week‑over‑week; share “CPU efficiency,” “memory efficiency,” and total cost per namespace with teams to reinforce the practice. (opencost.io)
Putting it together: in‑place resize + consolidation + visibility
The most reliable way to turn rightsizing into real savings is to combine three capabilities:
- In‑place Pod resize to reduce requests safely and frequently, with minimal disruption. (kubernetes.io)
- An autoscaler that consolidates freed capacity into fewer nodes (Karpenter’s consolidation is a strong option on EKS). (aws.amazon.com)
- A cost allocation system to quantify savings and spot regressions (OpenCost). (opencost.io)
When you put these together, two good things happen: you stop paying for idle headroom, and you’re not afraid to keep iterating because the operational risk is controlled.
Quick start checklist
- Confirm cluster is v1.33+ and kubectl is v1.32+. Verify InPlacePodVerticalScaling is enabled (it is by default in v1.33). (kubernetes.io)
- Add resizePolicy to your Deployments/StatefulSets.
- Roll out conservative CPU request reductions first; validate latency and error budgets.
- Adopt memory changes with RestartContainer and solid PDBs.
- Enable and tune consolidation on your node autoscaler. (karpenter.sh)
- Install OpenCost and kubectl‑cost; baseline efficiency and track improvements. (opencost.io)
Bottom line: in‑place Pod resize makes continuous rightsizing realistic. Start with low‑risk services, prove the impact with OpenCost, and let your autoscaler consolidate the rest. The result is a leaner cluster and a smaller bill—without the drama of rolling restarts.