How In-Place Pod Resizing Works in Kubernetes and Why Cast AI Makes It Better

Kubernetes 1.33+ introduces in-place pod resizing, allowing teams to change pod CPU and memory without restarts. Cast AI takes it further, automating those adjustments in real time for continuous optimization and cost savings.

Phil Andrews

Feb 17, 2026

Table of contents

For years, Kubernetes users have faced a frustrating limitation: to change a pod’s CPU or memory settings, they had to delete and recreate the pod. This disruptive process affects uptime, performance, and operational simplicity. But now, with the beta release of in-place pod resizing (starting in Kubernetes v1.33+), that’s changing.

This feature allows you to update container resource requests and limits without restarting the pod, making it possible to adapt to real-time workload demands with minimal disruption. For platform engineering, DevOps, and SRE teams, this is a game-changer, especially when paired with automation platforms like Cast AI that can fully leverage this capability to adjust resource allocations on the fly.

The challenge until now: resizing meant replacing

Before in-place resizing, tuning resource requests (CPU or memory) for a running pod meant tearing it down and spinning it back up, essentially restarting the workload. This introduced several challenges:

Downtime or performance hits during pod restarts
Manual intervention or complex restart automation logic
Inability to adapt quickly to short-lived usage spikes or dips
Missed optimization opportunities for autoscalers and cost-saving platforms

As a result, teams often resort to overprovisioning to avoid risk, leading to wasted resources and inflated cloud bills. Even with vertical pod autoscaling (VPA), the lack of in-place updates means there is always friction between optimization and operational safety.

With in-place resizing now in beta, Kubernetes finally offers a more graceful way to manage dynamic workloads, and Cast is built to take full advantage of it.

How does Cast AI leverage in-place pod resizing for workload optimization?

In-place pod resizing is a major step forward for dynamic workloads, especially those that temporarily spike in usage or taper off after peak periods.

However, while Kubernetes now supports this functionality at the platform level, it doesn’t decide when to apply changes or how much to adjust resources. Teams still need to monitor metrics in real-time to determine if an adjustment is needed and then apply resource updates through API calls or manifests.

Cast is purpose-built to eliminate the need for manual monitoring and decision-making by automatically detecting resource inefficiencies and applying the right adjustments at the right time. Cast’s workload optimization engine now fully supports in-place pod resizing and takes over the decision-making and execution process.

Watch a short video on how this works with Cast workload optimization:

Cast AI monitors resource usage continuously across all workloads in your cluster
When it detects over-provisioned or under-provisioned pods, it calculates the optimal CPU and memory requests
If the pod supports in-place resizing, Cast applies the changes automatically without restarting the pod
If a restart is required (for example, when switching between limits or when a container doesn’t support in-place updates), Cast AI handles it safely and intelligently to minimize impact

The result:

No manual resizing or guesswork
No YAML edits, kubectl commands, or alerts to chase.
Continuous optimization, even as workloads evolve

This gives platform and DevOps teams the performance and efficiency benefits of right-sized workloads without the operational overhead. Kubernetes finally becomes more graceful and adaptive without requiring hands-on effort.

Conclusion

In-place pod resizing marks a major step forward for Kubernetes resource management, eliminating the need to restart pods to adjust CPU or memory. But the real value comes when this capability is automated. With Cast AI, teams don’t need to monitor workloads or decide when and how to apply resource changes; the platform does it for them in real-time. The result is faster, more efficient applications with less manual effort and zero downtime.