Multi-Replica on Spot vs. Single On-Demand: A Cost Comparison

In this case study, we will demonstrate how running a multi-replica application on Spot Instances can be more cost-effective than running a single-replica application on On-Demand instances, without significantly impacting availability.

Marcus Arenas Avatar
Multi-Replica on Spot vs Single On-Demand

This article first appeared on Medium — read the original here.

Balancing reliability and cost is crucial when deploying applications on AWS. AWS offers several pricing schemes for compute, including On-Demand and Spot Instances. Each has its benefits and trade-offs. 

In this case study, we will demonstrate how running a multi-replica application on Spot Instances can be more cost-effective than running a single-replica application on On-Demand instances, without significantly impacting availability.

The cost comparison: On-Demand vs. Spot Instances

Before we get started, let’s recap the differences between these two pricing schemes:

  • On-Demand – You pay for compute capacity by the hour or second with no long-term commitments. Prices are fixed and higher compared to Spot Instances.
  • Spot Instances – Such instances offer up to a 90% discount compared to On-Demand instances, but come with the caveat that AWS can reclaim them when it needs capacity with just a 2-minute warning.

Case study

For example, let’s compare the pricing of m5.large instance type across these two schemes:

  • On-Demand (US East): $0.096 per hour
  • Spot Price (average, US East): $0.027 per hour

We will assume an application runs 24/7, and we are comparing costs over a 30-day period.

Scenario 1: single replica on On-Demand

Let’s assume we run a single-replica application on one m5.large On-Demand Instance. 

The total cost would be as follows:

  • Cost per hour: $0.096
  • Cost per day: $0.096 × 24 = $2.304
  • Cost per month (30 days): $2.304 × 30 = $69.12

So, running a single replica for 30 days on an On-Demand Instance costs $69.12.

Scenario 2: multi-replica (two replicas) on Spot Instances

Now, let’s scale our application to two replicas and run them on Spot Instances. We assume a 50% Spot interruption rate, which means half of the time AWS reclaims the Spot Instance, requiring rescheduling.

The cost breakdown:

  • Spot Instance price: $0.027 per hour (for m5.large)
  • Cost per hour for two replicas: $0.027 × 2 = $0.054
  • Cost per day for two replicas: $0.054 × 24 = $1.296
  • Cost per month for two replicas (30 days): $1.296 × 30 = $38.88

So, running 2 replicas for 30 days on Spot Instances costs $38.88, which is 44% cheaper than running a single replica on On-Demand instances.

Benefits of running multi-replica applications on Spot Instances

Increased availability

Running multiple replicas ensures better fault tolerance. Even with Spot Instance interruptions, having a second replica reduces downtime.

Lower costs

Even with two replicas, the total cost is lower due to the large discounts offered by Spot Instances.

Reduced risk

While Spot Instances can be interrupted, tools like Kubernetes can automatically reschedule pods onto available nodes, minimizing downtime.

Script for expediting your Spot usage

If you want to expedite the Spot usage, this script can help you temporarily scale all your Deployments and Statefulsets from single to multi replicas. In the future, you will need to update all yaml files. 

Note: This approach is recommended only for low environments.

#!/bin/bash

# Target replica count

TARGET_REPLICAS=2

# Scale deployments with a single replica

echo "Scaling single-replica deployments..."

for ns in $(kubectl get ns -o jsonpath="{.items[*].metadata.name}"); do

 for deployment in $(kubectl get deployments -n $ns -o jsonpath='{.items[?(@.spec.replicas==1)].metadata.name}'); do

   echo "Scaling deployment $deployment in namespace $ns..."

   kubectl scale deployment/$deployment --replicas=$TARGET_REPLICAS -n $ns

 done

done

# Scale statefulsets with a single replica

echo "Scaling single-replica statefulsets..."

for ns in $(kubectl get ns -o jsonpath="{.items[*].metadata.name}"); do

 for statefulset in $(kubectl get statefulsets -n $ns -o jsonpath='{.items[?(@.spec.replicas==1)].metadata.name}'); do

   echo "Scaling statefulset $statefulset in namespace $ns..."

   kubectl scale statefulset/$statefulset --replicas=$TARGET_REPLICAS -n $ns

 done

done

# # Scale replicasets with a single replica (not typically recommended as replicasets are managed by deployments)

# Uncomment lines bellow if you want to consider Replicasets

# echo "Scaling single-replica replicasets..."

# for ns in $(kubectl get ns -o jsonpath="{.items[*].metadata.name}"); do

#  for replicaset in $(kubectl get replicasets -n $ns -o jsonpath='{.items[?(@.spec.replicas==1)].metadata.name}'); do

#    echo "Scaling replicaset $replicaset in namespace $ns..."

#    kubectl scale replicaset/$replicaset --replicas=$TARGET_REPLICAS -n $ns

#  done

# done

echo "Scaling complete."

To run the script above, name it as script.sh, then chmod + x script.sh and ./script.sh.

By strategically running multi-replica workloads on Spot Instances, you can achieve significant cost savings while maintaining high availability. The example above shows that even with potential Spot interruptions, a multi-replica setup on Spot Instances can be cheaper and more resilient than a single-replica setup on On-Demand instances.

Another approach to consider for ensuring high availability and minimizing disruptions is using Pod Topology Spread Constraints and Pod Disruption Budgets (PDBs). The example above refers to AWS costs, but the same applies to other cloud providers.

If you’re looking to optimize your cloud costs while maintaining reliability, consider evaluating Spot Instances with multi-replica applications. 

Take Spot Instances to the next level with automation

You can avoid downtime due to lost instances by installing automation solutions for managing your cloud infrastructure using autoscaling approaches.

Using an automation tool allows you to use Spot Instances more reliably. For example, Cast AI lets you specify how much of your workload will run on a Spot Instance. If your Spot Instance gets interrupted and no other Spot Instances are available, we will automatically move your workloads back to On-Demand instances.

Cast AIBlogMulti-Replica on Spot vs. Single On-Demand: A Cost Comparison