Coming to AWS re:Invent? Meet CAST AI in booth #2346 or book a time for a 1-on-1 chat here

How to Reduce Your Amazon EKS Costs by Half in 15 Minutes

Overprovisioning is the top reason why teams see their cloud bills constantly growing. But choosing the best instances from the hundreds of options AWS offers is a tough call. Luckily, automation is here to help and slash your EKS costs in 15 minutes. Read this case study to learn more.

reduce amazon eks costs

How are you supposed to know which ones will deliver the performance you need? Luckily, there are automation solutions that can do that for you. The mobile marketing company Branch.io saved several millions of dollars per year by leveraging spot instance automation and rightsizing.

If you’re curious about how they work, follow my journey and see how I reduced the costs of running Kubernetes containers on Amazon EKS by 66% using CAST AI, in 15 minutes.

TL;DR

I started by provisioning an e-commerce app (here) on an EKS cluster with six m5 nodes (2 vCPU, 8 GiB) on AWS EKS. I then deployed CAST AI to analyze my application and suggest optimizations. Finally, I activated automated optimization and watched the system continuously self-optimize.

The initial cluster cost was $414 per month. Within 15 minutes, in a fully automated way, the cluster cost went to $207 (a 50% reduction) by reducing six nodes to three nodes. Then, 5 minutes later, the cluster cost went down to $138 per month using spot instances (a 66% reduction).

EKS savings with CAST AI

Step 1: Deploying my app and finding potential savings

I deployed my app in 6 nodes on EKS. Here’s what it looked like before – all the nodes were empty:

Deployed app in 6 nodes

The cluster was created via eksctl:

eksctl create cluster –name boutique-blog-lg -N 6 –instance-types m5.large –managed –region us-east-2

And here’s what it looked like after deployment (I’m using kube-ops-view, a useful open-source project to visualize the pods). The green rectangles are the pods:

pods after deployment

With Kubernetes, the application’s pods (containers) are spread evenly across all the nodes by default. Kubernetes is a fair orchestration engine; that’s just how it works. The CPUs range between 40% and 50%.

Note: All the EKS autoscaling mechanisms have been disabled on purpose since CAST AI will substitute them.

Now it’s time to connect my EKS cluster to CAST AI. I created a free account on CAST AI and selected the Connect your cluster option.

Connect cluster to CAST AI

The CAST AI agent went over my EKS cluster (learn more about how it works) and generated a Cluster Savings Report:

I can see different levels of savings I can achieve depending on the level of spot instance usage. A spot-only cluster generally brings maximum savings, spot-friendly clusters are balanced, and no spot usage brings the least savings. 

Now, I can see here that if I switched my 6 m5.large instances to what CAST AI recommends – 3 c5a.large – I could reduce my bill by almost 60%. Sounds like a plan!

With spot instances, I could get even higher savings (66.5%).

Step 2: Activating the cost optimization

To get started with cost optimization, I need to run the onboarding script. This script onboards the cluster into managed state, so it can be optimized automagically. To do this, CAST AI needs additional credentials as outlined here.

Step 3: Enabling policies

First, I have to make a decision: allow CAST AI to manage the whole cluster or just some workloads. I go for standard Autoscaler since I don’t have any workloads that should be ignored by CAST AI.

Next, I turn on all the relevant policies.

CAST AI automation policies

I can also configure the Autoscaler settings.

Here’s a short overview of what you can find on this page:

Unscheduled pods policy

This policy automatically adjusts the size of a Kubernetes cluster, so that all the pods have a place to run. This is also where I turn the spot instance policy on and use spot fallback to make sure my workloads have a place to run when spot instances get interrupted.

Node deletion policy 

This policy automatically removes nodes from my cluster when they no longer have workloads scheduled to them. This allows for my cluster to maintain a minimal footprint and greatly reduces its cost. As you can see, I can enable Evictor, which continuously compacts pods into fewer nodes – creating a lot of cost savings!

CPU limit policy 

This policy keeps the defined CPu resources within the defined limit. The custer can’t be beyond the maximum and minimum thresholds.

I enabled Evictor and set it to work. 
CAST AI Evictor feature

This is what Evictor in action looks like:

  1. One node (in red below) is identified as a candidate for eviction.
  2. Evictor automatically moves the pods to other nodes “bin-packing.”
  3. Once the node becomes empty, it’s deleted from the cluster.
  4. Go back to step 1.
Nodes status

One node is deleted:

Reducing number of nodes

Here are the Evictor logs:

time="2021-06-14T16:08:27Z" level=debug msg="will try to evict node \"ip-192-168-66-41.us-east-2.compute.internal\""
time="2021-06-14T16:08:27Z" level=debug msg="annotating (marking) node \"ip-192-168-66-41.us-east-2.compute.internal\" with \"evictor.cast.ai/evicting\"" node_name=ip-192-168-66-41.us-east-2.compute.internal
time="2021-06-14T16:08:27Z" level=debug msg="tainting node \"ip-192-168-66-41.us-east-2.compute.internal\" for eviction" node_name=ip-192-168-66-41.us-east-2.compute.internal
time="2021-06-14T16:08:27Z" level=debug msg="started evicting pods from a node" node_name=ip-192-168-66-41.us-east-2.compute.internal
time="2021-06-14T16:08:27Z" level=info msg="evicting 9 pods from node \"ip-192-168-66-41.us-east-2.compute.internal\"" node_name=ip-192-168-66-41.us-east-2.compute.internal
I0614 16:08:28.831083 1 request.go:655] Throttling request took 1.120968056s, request: GET:https://10.100.0.1:443/api/v1/namespaces/default/pods/shippingservice-7cd7c964-dl54q
time="2021-06-14T16:08:44Z" level=debug msg="finished node eviction" node_name=ip-192-168-66-41.us-east-2.compute.interna

And now the second and third nodes were evicted – 3 nodes remain:

Active nodes after optimization

After about 10 minutes, Evictor deleted 3 nodes and left 3 nodes running. Note that CPUs are now at a much healthier 80% rate.

The cost of this cluster is now $207.36 per month – half of the initial cost of $414 per month.

I managed to achieve 80% of the projected savings. This is what I saw in my CAST AI dashboard: 

Step 4: Running the full rebalancing for continuous optimization

Steps 1, 2, and 3 are fully automated. CAST AI gradually shrinks the cluster by eliminating waste and overprovisioning. It does so by bin-packing pods and emptying nodes one by one. From that moment, the cluster is optimized, and Evictor will continuously look for further optimization opportunities over time.

The next step is to run the full rebalancing where CAST AI assesses all the nodes in my cluster and then replaces some (or all) with the most cost-efficient nodes available, which meet my workload requirements. 

The nodes are cordoned:

Cordoned nodes

The first two nodes are drained, and the AI engine selects the most appropriate instances type for these nodes. This is what I saw in my CAST AI dashboard:

CAST AI setup progress

As you can see, my cluster now has only two nodes and costs $138 per month. It’s hard to imagine that I started out with a monthly EKS bill of $414.72!

Summary

Moving from a non-optimized setup to a fully-optimized one was a breeze. CAST AI analyzed my setup, found opportunities for savings, and swiftly optimized my cluster in 15 minutes. I cut my EKS bill by half in 15 minutes, from $414 to $207.

Then, I activated advanced savings by asking CAST AI to replace nodes with more optimized nodes and achieved further savings, ending up with a $138 bill.

Then I rebalance or partially rebalance the cluster and achieve further savings and the most optimized cluster state.

EKS savings with CAST AI

Run the free CAST AI Savings Report to check how much you could potentially save. It’s the best starting point for any journey into cloud cost optimization.

Access K8s cost monitoring for free

Connect your cluster and see your costs in 5 min,
no credit card required.

FAQ

What is Amazon EKS?

Amazon EKS (Elastic Kubernetes Service) is a handy service that provides and manages a Kubernetes control plane. EKS creates the control plane and Kubernetes API on your managed AWS infrastructure, and you’re ready to execute a Kubernetes workload. You can deploy workloads using native K8s tools like kubectl, Kubernetes Dashboard, Helm, and Terraform. Note that since the master nodes on EKS are under a separate AWS account, you don’t have access to them.

What are the advantages of Amazon EKS?

Amazon EKS comes with a handful of advantages:
You don’t need to set up, run, or manage your Kubernetes control plane.
You can quickly deploy Kubernetes open-source community tools and plugins.
EKS automates load distribution and parallel processing better than any DevOps engineer,
If you use EKS, your Kubernetes assets will work flawlessly with AWS services.
EKS uses VPC networking.
Any EKS-based application is compatible with those in your existing Kubernetes environment. You don’t have to alter your code to move to EKS.
It supports EC2 spot instances with managed node groups that use Spot Instances that result in significant cost reductions.

Is Amazon EKS expensive?

You’ll be charged $0.1 per hour for every Kubernetes cluster by EKS. This means some $74 each month per cluster, which isn’t a lot – especially if you’re handling the scalability level of Kubernetes well. The $74 extra on top of your bill isn’t going to make much difference compared to your overall compute costs.
 
Naturally, the cost of your EKS setup depends on what you choose to run it on. You can run EKS on AWS using EC2 or AWS Fargate. There’s also an on-premises using AWS Outposts.
If you decide to use EC2 (with EKS managed node groups), expect to pay for all the AWS resources you create to run your Kubernetes worker nodes. Just like with other AWS services, you only pay for what you use.

How much does Kubernetes cost on AWS?

First of all, Amazon EKS comes at a fee of $0.10 per hour for each cluster that you create. This sums up to around $74 per month per cluster. A single Amazon EKS cluster can be used to run multiple applications thanks to Kubernetes namespaces and IAM security policies. 
 
Next, there are the compute costs. You can run EKS on AWS using EC2 or AWS Fargate. 
 
If you go for EC2, you’ll be paying for all the AWS resources created to run your worker nodes (for example, EC2 instances or EBS volumes). There are no minimum fees or upfront commitments – you only pay for what you use.
 
And if you pick AWS Fargate, the pricing will be calculated based on the vCPU and memory resources used from the moment you start downloading your container image until the Amazon EKS pod terminates (the amount is rounded up to the nearest second). The minimum charge of 1 minute applies here.

  • Blog
  • How to Reduce Your Amazon EKS Costs by Half in 15 Minutes

Leave a reply

10 Comments
Oldest
Newest
Inline Feedbacks
View all comments
devanon
2021-07-19 12:54 PM

interesting case study, efficient autoscaling

Hans
2021-07-20 12:04 PM

Evictor looks promising

eks
2021-07-21 11:15 PM

so Evictor basically removes excessive nodes?

Vidmantas CAST AI
2021-07-23 10:45 AM
Reply to  eks

Yes, that is correct.

cool
2021-07-22 2:30 PM

nice to see a real-life example

Vidmantas J.
2021-07-27 12:54 PM
Reply to  cool

Hi, you may also like to check out our case studies with real clients. You can find this here: https://cast.ai/case-studies/

Thomas
2021-08-11 7:42 AM

This kubeops view is major discovery for me, visualizing the clusters and pods, loved that!

David Kurk
2021-08-11 8:38 AM

Even though the first stage looks like just simply shrinking the pods in to fever clusters, the autoscaling and spot instance replacement looks promising, I’ll give it a shot with my team later on this week

ikn
2021-08-12 9:59 AM

I liked how the evictor is such a simple feature, but so needed that we sometimes dont think about how much it could save..

Jim
2021-08-12 10:09 AM

Running this script in my cluster doesn’t seem a very great idea, can I have a look at it before running it?

Recent posts

Are your workloads spot instance friendly?

Get the spot-readiness cheat sheet and quickly determine which of your workloads can reliably run on spot instances.

We may occasionally email you with cloud industry news and great offers.