Spot Instances can cut your cloud compute costs by up to 90%. But for most teams, the fear of interruptions keeps them confined to dev environments and batch jobs. Running production apps on Spot just feels too risky.
But you don’t have to choose between cost savings and reliability. Instead, you can tailor your infrastructure strategy to meet the specific needs of each workload. After all, your payment API and your analytics pipeline don’t require the same guarantees, so why treat them the same way?
In this guide, we’ll walk through a practical approach to running Spot Instances in production using Cast AI’s Pod Mutations. You’ll learn how to tier your applications by criticality, automatically route workloads to the right instance types, and put guardrails in place to protect against interruptions. The result: meaningful cost savings without sacrificing the availability your users depend on.
Full Spot adoption in production with Cast AI
What you need to get started
- Kubernetes cluster connected to Cast AI
Step 1: Create a single Node Template with both On-Demand and Spot offerings
Step 2: Segregate apps by tier of criticality
To segregate apps, use tier labels such as tier 0, 1, 2, etc.:
| Tier Label | Mix | Example use case |
| tier= “0” | 100% On-Demand | Payment, auth, checkout APIs |
| tier= “1” | 70% OD / 30% Spot | Business APIs, order processing |
| tier= “2” | 50% OD / 50% Spot | Dashboards, internal services |
| tier= “3” | 100% Spot | Batch jobs, analytics, dev/staging |
Step 3: Create Pod Mutations matching the set percentages
Based on the criticality, create four Pod Mutations that match the percentages listed above. Each Pod Mutation will use matchLabels: tier: “X” to filter pods and enforce node affinity for Spot vs On-Demand scheduling.
Example:



Step 4: Set up guardrails against interruptions
Make sure to adopt best practices to avoid potential disruptions resulting from your Spot Instances getting interrupted:
- Pod Disruption Budgets – PDBs protect critical workloads during interruptions
- Multi-Replica – This is a minimum of 3 for Tier 0-1, minimum of 2 for Tier 2
- Topology Spread Constraints – You should spread pods across Availability Zones to boost your app’s resilience
Without these guardrails in place, even Tier 0 apps may face disruptions during node maintenance or cluster operations!
Result: A single template with four scheduling behaviors
Using a single template, you can set up four different scheduling behaviors based on app criticality. While critical apps get stable resources (On-Demand), non-critical apps generate cost savings through Spot Instances.
Wrap up
Running Spot Instances in production isn’t an all-or-nothing decision. By tiering your applications based on criticality and using Pod Mutations to enforce the right scheduling behavior, you can capture significant cost savings on workloads that can tolerate interruptions while keeping your most critical services on stable, On-Demand infrastructure.
The approach outlined here provides a single, manageable node template that automatically handles four distinct scheduling behaviors. Combined with proper guardrails – Pod Disruption Budgets, multi-replica deployments, and topology spread constraints – you get the best of both worlds: lower costs and production-grade reliability.
Start by labeling your workloads and setting up your Pod Mutations – and let Cast AI handle the rest.
4.8/5 140+ reviews



