Automating and optimizing Kubernetes for a US-based government contractor
Chesapeake Technology International (CTI) was looking to automate the process of scaling cloud resources up and down in line with application demand to strike a balance between cost and performance without engineers having to scale manually. The company ran the CAST AI Savings report, discovered a high level of potential savings, and implemented automated cluster optimization for a smooth transition into a fully-automated optimized state.
Cloud services used
Empowering defense with the public cloud
CTI is a software and systems development company that delivers advanced military and security applications to major defense agencies in the United States.
CTI started to explore running Kubernetes clusters on the public cloud outside of Rancher within AWS and Azure. To run things smoothly across multiple deployment areas, the DevOps engineer implemented Rancher Kubernetes Engine (RKE). CTI engineers decided to look beyond RKE and the configuration effort it required, experimenting with clusters running directly on the public cloud platform.
Rancher is pretty well-documented, but it’s less documented than just pure Azure or AWS. If you want to set up DNS, there’s a whole way to do DNS within Rancher.
Using stuff like EC2 instead of EKS requires somebody to constantly look at the cluster and oversee how it’s used. And if we’re underusing it and if it’s expensive, we’d like to change that.Eric Venturino, Software Engineer at CTI
Running Kubernetes clusters without automation turned out to be a challenge
The first experiences with EC2 and EKS showed the team how important cluster monitoring was for avoiding problems like overprovisioning or underutilization. Scaling resources up and down manually wasn’t efficient, so CTI started searching for an automation solution that would adjust resources to the workload demands instantly.
CTI runs a relatively lean DevSecOps team, with DevOps and optimization tasks falling upon engineering teams that deploy applications. Hiring more people to keep on optimizing our setup manually didn’t make sense. Even when just experimenting with Kubernetes, we couldn’t keep on scaling resources manually. We needed a solution that would automate as many of these tasks as possible.Eric Venturino, Software Engineer at CTI
This is when the company turned to CAST AI
CTI started looking into potential solutions that would fully automate scaling cloud resources and optimizing their cost.
For example, I currently have a sandbox cluster that’s just for developers for many projects to experiment on. Sometimes it’s very active and being used a lot, and sometimes it’s not being used at all.
We have like 9 nodes up on it, so it would be on somebody to manually scale those down. If nobody is using it, it’s just wasted. So that’s at least part of the reason why we started to look at stuff like EKS, AKS and automated solutions like CAST AI.Eric Venturino, Software Engineer at CTI
After assessing various platforms, the company decided to test CAST AI. Running the CAST AI Savings report revealed an incredible level of potential savings from cluster optimization.
I ran the report for our sandbox cluster, which is pretty large because a lot of work gets done on it. We were struggling with scaling resources down after the work was done.
The Savings report showed that by optimizing the cluster with CAST AI, we could save ~$900 per month or ~$11k per year. I was like, wow, that’s actually a lot. And once we turn the automated optimization on, it’s just running efficiently on autopilot. The ROI we got from implementing CAST AI made it all well worth the effort.Eric Venturino, Software Engineer at CTI
After seeing such impressive results, CTI is planning to distribute CAST AI across its engineering teams to help them manage their clusters better: automatically scaling resources down or removing them when there’s no more work to be done and provisioning new ones when applications need them.
Get results like CTI – book a demo with CAST AI now
CAST AI features used
- Spot instance automation
- Real-time autoscaling
- Instant Rebalancing
- Full cost visibility