According to the 2025 Kubernetes Cost Benchmark Report, applications use only 10% of CPU and 23% of memory resources allocated to them. Just think of the massive financial impact teams could achieve by reducing this cloud waste.
When it comes to improving resource utilization and reducing your costs, you can start tweaking things manually or delegate the task to an automation solution like Cast AI.
Cast was built to help teams spend less time on manual tasks like:
- Scaling cloud resources in line with real-time demand,
- Setting workload requests and limits,
- Dealing with out-of-capacity cloud events,
- Estimating which instance types are most suitable given availability and price across various cloud regions,
- And many more.
What specific areas does Cast AI automate, and in what ways? Continue reading to learn about the key areas of Kubernetes management that require automation.
6 ways to cut Kubernetes cloud costs automatically with Cast
1. Compute instance selection and rightsizing
Choosing the most cost-effective instance for the work is difficult for several reasons. When selecting virtual machines for workloads, teams should order only what they require across several compute dimensions, such as CPU (and type, x86 vs. ARM, or vs. GPU), memory, SSD, and network connectivity.
Taking the time to review all available possibilities is worthwhile because computing is sometimes one of the more expensive components on a cloud account.
With hundreds of machine kinds and sizes available from major cloud providers, engineers face the challenge of making the proper pick manually. Combing through the list of 700+ AWS instances every time you want to provision a node is simply unrealistic.
How Cast automates instance selection and rightsizing
Cast automatically chooses, provisions, and decommissions compute instances in response to dynamic workload demands.
Here’s how it works. Cast:
- Analyzes the workload to understand the CPU and RAM needs,
- Examines the hundreds of alternatives from AWS, Google Cloud, and Microsoft Azure,
- Identifies the best performance match at the lowest possible cost,
- Provisions the resource automatically, giving the workload a place to run.
Real-life example: Phlexglobal (60% cost savings)
Phlexglobal, a provider of automation and AI solutions for clinical and regulatory issues, used this feature to save 60% on cloud costs.
“We may have a memory-intensive or CPU-intensive workload that requires the appropriate cloud resources to operate it. It is far more difficult to do this natively with Kubernetes. Create distinct node pools, configure the tolerations, and monitor the operation.
With Cast AI, it’s essentially simply an annotation, and the solution will act autonomously, purchasing more resources in accordance with our specifications. This simplifies the lives of our engineers.”
Alex Potter-Dixon, VP Cloud Engineering and Operations, Phlexglobal
Read the entire case study here.
2. Cluster autoscaling
Configuring and managing native Kubernetes autoscaling mechanisms takes time and effort, especially if teams use many autoscalers. The same goes for open-source solutions, which require configuration and monitoring to ensure smooth operation.
How Cast autoscales K8s clusters
The Cast Autoscaler dynamically adjusts cloud capacity to accommodate real-time demand changes while minimizing downtime. It’s simple to set up and will run automatically based on the policies you provide.
For example, Node Templates can be used to define virtual buckets with restrictions such as:
- instance types to be used,
- lifecycle of the nodes to add,
- node provisioning configurations,
- and other properties.
Real-life example: Foretellix (30% cost savings)
Foretellix, the leading provider of data automation for AI-powered autonomy, uses the Cast autoscaler to efficiently manage spiky usage patterns of short-lived jobs by provisioning the compute capacity in less than two minutes.
The image below shows how the cluster rapidly scaled up from zero to 2000 CPUs and then down from 2000 CPUs to zero.

Here is a similar rapid node scaling pattern for a 30-day period that improved resource efficiency and optimized compute costs.

3. Workload autoscaling
Setting workload requests and limits in Kubernetes is challenging because it requires a deep understanding of both the application’s resource consumption patterns and the cluster’s capacity.
Misconfigured values can lead to inefficient resource utilization: too low, and workloads may be throttled or evicted; too high, and they may starve other pods or inflate infrastructure costs.
The dynamic nature of workloads, coupled with unpredictable traffic spikes and the lack of precise profiling tools, makes it difficult to strike the right balance.
How Cast autoscales workload requests and limits
Cast’s Workload Autoscaler continuously monitors actual CPU and memory usage patterns and dynamically adjusts resource allocations without requiring manual intervention or causing downtime. This includes in-place pod resizing, which lets Cast AI modify resource settings on running pods instantly
Real-life example: Bud (47% cost savings)
Bud, an innovative company that provides actionable insights for LLMs in the financial services sector, uses Workload Autoscaler to rightsize workloads, reduce CPU requests, and unlock new cost savings.
The image below shows the impact of Workload Autoscaler on the compute cluster cost.

This graph illustrates the drop in requested CPU per hour after integrating Cast.

4. Bin-packing pods to nodes
The Kubernetes scheduler distributes pods across nodes in a fair way, rather than for optimal node utilization or cost efficiency. However, to lower your cloud costs, you must bin pack pods to nodes and immediately delete empty nodes to prevent paying for idle resources. Because things move so quickly in Kubernetes, it is difficult to do so manually.
How Cast bin-packs pods to boost resource utilization
Cast comes with an automated mechanism for continuously compacting pods into fewer nodes, resulting in empty nodes that can be eliminated using the Node deletion policy (if enabled). To prevent downtime, Evictor will only consider apps with multiple replicates. To save even more on your cluster, Evictor selectively eliminates empty nodes that haven’t been used in a specified amount of time – you’ll never pay for idle resources again.
Real-life example: Flowcore (50% cost savings)
Flowcore, the startup behind a developer-first platform that makes it easy to collect, process, and act on data in real time, uses Cast’s bin-packing to reduce cloud waste at the node level, resulting in optimal resource utilization and compute cost savings.
This image shows dramatic resource utilization improvements across multiple clusters:

5. Instantly shifting workloads to optimal computing instances (rebalancing)
While cloud cost optimization takes time, some teams may choose to transfer their workloads to new machines as soon as they discover potential savings through recommendations. Spinning up new instances and relocating pods to them manually is time-consuming, and all that manual effort often reduces the potential ROI from optimization initiatives.
How does Cast’s automated rebalancing work?
Cast not only identifies which compute instances provide the greatest price-performance ratio for a specific Kubernetes setup, but the solution also allows users to rapidly replace some (or all) of the non-optimized nodes with the most cost-effective ones instantly.
Real-life example: Wio Bank (up to 70% cost savings)
Wio Bank, an innovative financial services company, uses Cast’s rebalancing to automatically replace suboptimal nodes with new ones that are more cost-efficient and run the most up-to-date Node Configuration settings.
Here’s an example of a rebalancing job that generated 84.5% cost savings:

6. Spot Instance automation
Spot Instances have enormous cost-saving potential. But the cloud provider can reclaim them with as little as 30 seconds notice. Spinning up a new instance takes longer, so your workload will be without a place to execute, perhaps causing your application to fail.
This is only the tip of the iceberg when it comes to Spot Instances:
- Because you obtained these instances through a bidding process, you are responsible for determining the hourly rate. The Spot Instance will only serve you if no one else bids higher than you.
- Spot Instances are immediately shut down if their pricing exceeds your maximum bid.
- The provider may run out of Spot Instances to give, which frequently occurs during peak seasons such as the Christmas holidays.
Cast automates the entire Spot Instance lifecycle
Cast selects the best Spot Instance for a particular workload, provisions the instance, and, if an interruption occurs, moves workloads to a new instance or an On-Demand Instance in the event of a shortage of availability.
- Decommissioning -– When no more jobs need to be completed, Cast automatically shuts down instances. You don’t want to waste money on resources that don’t add value to your organization, even if they are as inexpensive as Spot Instances.
- Spot fallback mechanism – When instances are in low supply, Cast shifts workloads to on-demand instances to reduce the risk of interruption and ensure that all workloads have a place to run.
“We now know that whenever there are no Spot Instances, it will automatically switch to Reserved Instances without any effort on our part. It’s a no-brainer for us.”
Ron Grosberg, VP, Research & Development at Foretellix
- Partial use of Spot Instances – Cast allows users to run a subset of workloads on Spot Instances without having to edit manifest files. If running 100% of your workloads on Spot Instances is overwhelming, you can easily switch to operating 60% on on-demand instances and 40% on Spot Instances.
Real-life example: Heureka (30% cost savings)
Heureka Group, a leading price comparison platform and online shopping advisor, unlocked even more cost savings thanks to the Cast autoscaler, which moves workloads already deployed on Spot VMs to cheaper families while maintaining service uptime and application performance.
The image below illustrates the massive drop in compute costs.

Automation makes Cast the best Kubernetes cost optimization tool
Cast offers a portfolio of automation solutions in one place, all of which work together to bring benefits. Implementing it is quick and has a short ROI timeline, unlike open-source solutions for some of the automation domains mentioned above. Cast also requires less monitoring when in use, allowing engineers to focus on solving business-critical challenges rather than infrastructure.
Schedule a demo to receive a personalized walkthrough of our platform and learn more about its automation capabilities. Or click below to try Cast free.



