Organizations running resource-intensive, stateful applications require high availability and can’t afford downtime. However, due to the lack of widely adopted commercial solutions for migrating these workloads – whether moving stateful apps to more cost-efficient infrastructure, transitioning legacy systems to Kubernetes, or adapting non-Spot-ready workloads for Spot Instances – they’re often forced to operate on underutilized, expensive nodes.
This is where Container Live Migration comes in. The solution ensures that stateful applications stay resilient and responsive by allowing seamless transitions from one node to another without disrupting the application’s operations.
These formerly intractable workloads are now automatically merged onto fewer nodes to maintain continuous availability, lower resource fragmentation, and lead to additional cost savings.
Benefits of Container Live Migration
No downtime for important workloads
Instead of the workload failing or needing to restart, the solution seamlessly transfers it to the next available node.
This ensures that stateful and important workloads continue to run without interruption, lowering the risk of failure and ensuring continuous service delivery even when the underlying infrastructure changes.
Intelligent node optimization
Cast AI’s Evictor maximizes cluster utilization by transferring workloads off underutilized nodes, resulting in significant cost reductions without service downtime. Stateful workloads limited traditional bin-packing techniques; live migration eliminates these limitations.
Maintaining network connections
Maintain active TCP connections and session state during migration to minimize disruptions to client applications and current transactions. Applications with severe timeout requirements may require suitable timeout settings to manage the small migration window.
Increasing resource use for cost reductions
Container Live Migration facilitates the continuous migration of stateful workloads between nodes, allowing the Evictor and Rebalancer to perform across a variety of workloads.
With Container Live Migration, Cast AI users may increase bin-packing efficiency while drastically reducing node fragmentation and cloud infrastructure expenses. This improves resource efficiency and reduces costs by maximizing the impact of both the Evictor and Rebalancing capabilities.
Extra cost savings thanks to Spot Instances
Container Live Migration integrates existing features that cover the whole Spot Instance lifecycle, from provisioning and rightsizing to decommissioning or migrating workloads to on-demand instances when none are available.
This allows teams to securely execute stateful workloads on cost-effective Spot Instances, knowing that disruptions will be handled with little service impact.
How does Container Live Migration work?

Container live migration utilizes advanced checkpoint and restore technology to seamlessly transfer operating pods between nodes:
- Workload Assessment – Cast AI’s live controller analyzes your cluster automatically and detects workloads that are suitable for live migration, assigning appropriate labels based on workload characteristics.
- Evictor – When Cast AI’s Evictor detects bin-packing opportunities, it chooses whether to live-migrate or evict the workload.
- State Transfer – The system transmits memory pages and process state to the target node while the application is still executing, reducing freeze time.
- Network Preservation – A forked version of the AWS VPC CNI ensures that pods’ IP addresses and TCP connections remain intact during the transfer.
- Seamless Handover – The application is temporarily suspended while the final state is transmitted, after which the workload resumes with full continuity on the new node. The migration length depends on application memory use, instance type, and network throughput.
The entire process combines CRIU (Checkpoint/Restore In Userspace) technology with Cast AI’s orchestration layer to enable dependable, quick migrations with minimal application impact.
Live Migration in action: Migrating a running Minecraft server between nodes during an active game without any interruption
This demo showcases the capabilities of Container Live Migration, emphasizing seamless application continuity during pod relocation across nodes.
Step 1: Activating Auto-Migration
Auto-migration is enabled via the “auto-move” setting. Once active, the system initiates pod migration every 20 seconds to a different node within the cluster.

Step 2: Observing the First Migration
Cluster dashboard displays a yellow indicator signaling the start of migration. A new pod named clone-1 appears on node 100 and the original pod begins termination. Note that the game remains fully operational during this transition.

Gameplay continues seamlessly, with no visible impact to the user experience.
Try it here:

Wrap up
Stateful workloads used to be a hurdle for Kubernetes teams. Cast AI Container Live Migration automatically packs stateful workloads that were previously immovable onto fewer nodes, ensuring continuous availability, decreasing resource fragmentation, and delivering extra cost savings.
Schedule a demo to learn how this functionality can help you handle stateful Kubernetes workloads in your cluster.



