Tier Your Apps, Cut Your Costs: A Practical Framework for Spot Instances in Production

In this guide, we’ll walk through a practical approach to running Spot Instances in production using Cast AI’s Pod Mutations.

Marcus Arenas

Feb 17, 2026

Table of contents

Spot Instances can cut your cloud compute costs by up to 90%. But for most teams, the fear of interruptions keeps them confined to dev environments and batch jobs. Running production apps on Spot just feels too risky.

But you don’t have to choose between cost savings and reliability. Instead, you can tailor your infrastructure strategy to meet the specific needs of each workload. After all, your payment API and your analytics pipeline don’t require the same guarantees, so why treat them the same way?

In this guide, we’ll walk through a practical approach to running Spot Instances in production using Cast AI’s Pod Mutations. You’ll learn how to tier your applications by criticality, automatically route workloads to the right instance types, and put guardrails in place to protect against interruptions. The result: meaningful cost savings without sacrificing the availability your users depend on.

Full Spot adoption in production with Cast AI

What you need to get started

Kubernetes cluster connected to Cast AI

Step 1: Create a single Node Template with both On-Demand and Spot offerings

Step 2: Segregate apps by tier of criticality

To segregate apps, use tier labels such as tier 0, 1, 2, etc.:

Tier Label	Mix	Example use case
tier= “0”	100% On-Demand	Payment, auth, checkout APIs
tier= “1”	70% OD / 30% Spot	Business APIs, order processing
tier= “2”	50% OD / 50% Spot	Dashboards, internal services
tier= “3”	100% Spot	Batch jobs, analytics, dev/staging

Step 3: Create Pod Mutations matching the set percentages

Based on the criticality, create four Pod Mutations that match the percentages listed above. Each Pod Mutation will use matchLabels: tier: “X” to filter pods and enforce node affinity for Spot vs On-Demand scheduling.

Example:

Step 4: Set up guardrails against interruptions

Make sure to adopt best practices to avoid potential disruptions resulting from your Spot Instances getting interrupted:

Pod Disruption Budgets – PDBs protect critical workloads during interruptions
Multi-Replica – This is a minimum of 3 for Tier 0-1, minimum of 2 for Tier 2
Topology Spread Constraints – You should spread pods across Availability Zones to boost your app’s resilience

Without these guardrails in place, even Tier 0 apps may face disruptions during node maintenance or cluster operations!

Result: A single template with four scheduling behaviors

Using a single template, you can set up four different scheduling behaviors based on app criticality. While critical apps get stable resources (On-Demand), non-critical apps generate cost savings through Spot Instances.

Wrap up

Running Spot Instances in production isn’t an all-or-nothing decision. By tiering your applications based on criticality and using Pod Mutations to enforce the right scheduling behavior, you can capture significant cost savings on workloads that can tolerate interruptions while keeping your most critical services on stable, On-Demand infrastructure.

The approach outlined here provides a single, manageable node template that automatically handles four distinct scheduling behaviors. Combined with proper guardrails – Pod Disruption Budgets, multi-replica deployments, and topology spread constraints – you get the best of both worlds: lower costs and production-grade reliability.

Start by labeling your workloads and setting up your Pod Mutations – and let Cast AI handle the rest.

4.8/5 140+ reviews