Recorded panel

Kubernetes Cost Optimization News

Discover the latest Kubernetes trends and insights into resource utilization.

Transcript: Kubernetes Cost Optimization News – Recorded Panel

Joe Roualdes: All right, let’s go ahead and kick things off. Thank you, everybody, for joining today’s webinar, where we’re going to be sharing some new research from both Forrester as well as Cast AI regarding CPU and memory utilization, and GPU availability and pricing. Can we go to the next slide, please?

Awesome. We’ve got some great speakers today. We’ve got Tracy Woo, who’s a principal analyst at Forrester, and we’ve got Laurent Gil, who is our co-founder and president here at Cast AI. Thanks. Today’s agenda: as I mentioned, we’re going to be going through the results of the 2025 Kubernetes Cost Benchmark Report, which is our third annual report. As I mentioned, we’ll cover CPU and memory utilization, as well as GPU availability and pricing.

We’re also going to be going through Tracy’s research with Forrester and some of the highlights that she’s seeing in utilization as well. And then we’ve got time left at the end for Q&A. As we’re going throughout the presentation today, please feel free to ask questions in real time. We’ll be taking a look at those, and if it makes sense, we’ll ask those questions live to Tracy and Laurent.

If it doesn’t, we’ll hold them until the Q&A section at the end. Next slide, please.

Awesome. So we’re gonna kick things off. First, we’re gonna start with a poll. Do you use automation to manage Kubernetes cloud resources? Please go ahead and click on which of these three options fits you and your company best.

We’ll give everybody about 30 seconds to respond, and then we’re going to take a look at the results and then dive in.

All right. Thank you for taking the quick poll. So it looks like we’ve got 48% of folks saying, “I never use automation for this aspect of K8s,” a very close 36% saying that they’re experienced automation users. And then we’ve got 17% saying, “I’m experienced with automation in dev only.” So it’s great to see that we have people with experience in automation.

It’s also great to see folks who don’t have experience with automation. We’re gonna be talking a lot about automation today. I think regardless of whether you haven’t used automation before or you are an avid user and think you’ve got it all covered, what you’re gonna find is we have a very specific way of defining automation at Cast AI.

And you may learn a thing or two about ways you could be using automation to help drive more value for your team and your business. So with that, I’m gonna hand off to Laurent, who’s gonna dive into the Kubernetes Cost Benchmark Report results.

Laurent Gil: Thank you. Thank you, Joe. So I’ll go through the report before I show you guys some of those findings.

I want to thank you, the audience, for being here. It’s a very special day because it’s the first time I have the pleasure and the honor to welcome Tracy in this talk. Tracy is one of the lead researchers at Forrester, and it’s a pleasure to have you here. So thank you, Tracy, for joining us.

Tracy Woo: Thank you for having me.

Laurent Gil: Yep, I really appreciate it. This is gonna be super cool. In the next few minutes, I’m gonna show you some of the results of the benchmark report. We selected some that were expected, but we also found a lot of unexpected results, and I will go through some of those. And for the audience, feel free to ask questions as we go through this.

Because I’m gonna show you some that are very surprising, and Tracy also has some very interesting statistics on her side that will help foster that discussion. So please, by all means, ask questions as we go. The report was made using 2024 data, up until December. On the GPU side, we actually also included January and February of this year.

It’s a study based on more than 2,000 organizations. That means this is not a subjective study — we haven’t asked people to give us their answers to these questions.

We are extracting this data directly from our customers’ clusters and analyzing it without human bias. So the results are pretty surprising. We’re also looking at clusters from different cloud providers — AWS, Azure, and GCP — cluster size, region, and whether they are production or non-production.

All of this is part of the report, and we’ll share a link at the end so you can take a closer look. With that, let’s start by looking at CPU utilization because, I think, this year’s results are quite surprising.

The first thing we look at is CPU requests versus actual usage. For the third year in a row, the numbers are roughly the same. Customers request a certain amount of CPU for their workloads, but on average, roughly 70% of what has been requested is not used — meaning it’s wasted.

And this is across 2,000 organizations. It’s quite a lot — 70% of CPU that could be saved but is actually wasted. And as you know, when you request CPU or memory through Kubernetes, this translates into the number of nodes you need in your cluster.

And if you request too much, the cluster will allocate more nodes than you actually need. This translates into cost. So, 70% waste is a big deal.

And it’s not just CPU. We see the same trend for memory — a very similar situation. So overall, the wasted resources amount to billions of dollars.

Now the next thing we looked at is real-time clusters. This is not static data — every five minutes we capture the cluster metrics. We look at actual usage and unused capacity — what is allocated but not used — on a minute-by-minute basis.

And again, year over year, clusters are consistently wasting resources in a very similar way. What is very interesting is that when you look at production and non-production clusters, the numbers are almost identical.

So even if you think your production clusters are well tuned and heavily used, the data shows something different — they waste almost as much as dev or staging clusters.

The next thing we looked at is cluster size. We thought maybe small clusters behave differently than large clusters. But again, the results are consistently the same. Whether it’s a small cluster with fewer than 20 nodes, or a cluster with 100 nodes or more — the waste is the same.

This was one of the most surprising findings in this year’s report — cluster size does not change utilization. Both small and large clusters waste roughly 70% of their allocated CPU and memory.

Then we looked at cloud providers. And here, we did find some differences. For example, clusters running on AWS tend to waste slightly less — around 66%, while clusters running on Azure waste around 72%. GCP sits roughly in the middle. But even the “best” number — 66% waste — is still very high.

The last part of the report focuses on GPUs. And this year, GPU availability and pricing have become critical topics. We looked at GPU instance availability by region and cloud provider, and the results were very interesting.

In some cases, availability is extremely low, especially for high-demand GPUs like the H100. Even if you have budget, you may not be able to get the instances you need because they’re not available in your region.

We also looked at GPU pricing trends. And here again, something fascinating: in many regions, GPU pricing has increased significantly year over year. So it’s extremely important to choose the right region and the right instance families.

This is where automation becomes critical, and we’ll discuss that later in the webinar. But now, I’d like to hand it over to Tracy, who will walk us through the research Forrester conducted and how it compares.

Tracy Woo: Thank you, Laurent. Yeah, it’s really interesting hearing your stats. They actually line up quite closely with the research that we see here at Forrester.

So, about a year and a half ago, we kicked off a research project where we wanted to look at Kubernetes utilization. We ran a survey with a few hundred respondents. These were all individuals who were specifically managing Kubernetes clusters — typically in DevOps or platform engineering roles.

What we found was interesting. We asked: “What percentage of your Kubernetes resources do you think are wasted on average?” And the majority said around 60%.

And that’s a big number. But hearing that your real cluster data shows around 70% waste — that’s not far off, which is quite validating.

Then we took it a step further and asked, “What percentage of your overall cloud bill is related to Kubernetes?” And it’s no surprise — these days, almost everything is moving to Kubernetes. The majority of respondents said that K8s represents 50% or more of their cloud bill.

We also looked into the reasons why Kubernetes waste happens. The top reasons were not surprising: overprovisioning, lack of visibility, limited automation, and teams not having enough time to fine-tune resources.

One interesting thing we saw is that most organizations believe they are “doing okay” when it comes to managing their Kubernetes costs — until they actually start measuring it.

Once they begin monitoring real usage, that’s when they realize how much waste there actually is. I think that’s very much in line with what your report uncovered.

We also found that many organizations rely heavily on manual processes — manually optimizing requests and limits, manually selecting instance types, manually adjusting autoscaling. And with the increasing complexity of cloud environments, this is simply becoming unsustainable.

Another interesting point: when we asked about planning and forecasting, most organizations said it’s extremely difficult to accurately forecast Kubernetes and cloud spending.

And lastly, we looked at cloud provider differences. Similar to your findings, AWS tends to have slightly better utilization, Azure tends to be lower, and GCP is somewhere in between.

But what’s important is that regardless of cloud provider, the waste numbers are still huge — it’s a systemic problem.

So overall, your benchmark aligns very closely with the survey research that we conducted. And I think this consistency shows that Kubernetes waste is a major industry-wide challenge — not just something that affects a few outliers.

Laurent Gil: Thank you, Tracy. This is super interesting. And I can’t wait to discuss that further with Joe during the Q&A because this topic of manual optimization versus automation is a very challenging one.

But let’s continue with the report. Next slide, please.

Here, we are looking specifically at clusters in production. What you’re seeing on the screen is a distribution of workload utilization. So on the x-axis, you have the percentage of CPU requested that is actually being used, and on the y-axis, the number of workloads.

As you can see, the vast majority of workloads stay in the 0–20% utilization range. That means they are using only a small fraction of the resources that were requested.

And in fact, only a tiny portion of workloads ever reach 80% or higher utilization — which is where you want your clusters to be if you want them to be efficient.

So from this distribution, you can clearly see that for most organizations, workloads are heavily overprovisioned.

And as we mentioned earlier, requests translate directly into cost. If you’re requesting more than you need, your cluster will be larger. And if your cluster is larger, your cloud bill is higher — it’s as simple as that.

Now, here’s something even more interesting. When we compare this distribution year over year, it has barely changed.

Despite all the improvements in tooling, despite all the best practices that teams are trying to implement, despite more awareness — utilization remains almost the same. Workloads are still significantly overprovisioned.

So this tells us something important: manual tuning isn’t solving the problem at scale.

Next slide, please.

Here, we looked at utilization by organization size — small, medium, and large companies. Again, almost no difference. Regardless of whether you’re a startup or a large enterprise, the overprovisioning trend is the same.

It’s a systemic issue. It’s not tied to maturity, budget size, or engineering headcount — everybody struggles with the same inefficiencies.

Another thing we looked at is node type — general-purpose nodes, compute-optimized nodes, memory-optimized nodes. And again, the results were similar.

General-purpose nodes are slightly less wasteful, memory-optimized nodes slightly more wasteful, but overall the difference is small.

And you would expect memory-optimized nodes to have higher utilization because they’re typically used for workloads like databases or analytics — but the data shows the opposite. They’re actually the most overprovisioned node type.

So even when teams deliberately choose instance types for specific workloads, overprovisioning persists.

Now let’s talk about Spot instances. Many organizations use Spot to reduce cloud costs. And of course, Spot can be very powerful — but it’s also risky because Spot instances can be reclaimed at any time.

What we found is that Spot instance availability varies significantly across regions and instance families. In some regions, availability is reasonably high. In other regions, it’s extremely low, meaning your clusters may be constantly interrupted.

So relying heavily on Spot is not always a reliable long-term cost strategy unless you use some level of automation or fallback logic.

And this ties into something Tracy mentioned — the difficulty of forecasting. Spot pricing and availability are extremely hard to predict.

Next slide, please.

Now, let’s look at GPUs. This is one of the most important findings in this year’s report.

GPU availability is extremely challenging right now. If you’re looking for H100s, in many regions they are simply not available. We saw availability as low as 2% in some zones.

And even when availability exists, pricing is all over the place. Some regions are 2–3× more expensive than others for the exact same GPU.

So choosing the right region and the right SKU can dramatically impact your GPU budget.

And if you’re manually managing your GPU clusters, this becomes incredibly complex. Automating the selection of GPU types and regions becomes essential.

Next slide, please.

Now let’s talk about bin packing — one of the most misunderstood concepts when it comes to Kubernetes cost optimization.

A lot of people think that bin packing is simply “putting more pods on the same node.” But in reality, bin packing is about intelligently placing workloads across nodes based on real usage.

And doing this manually is extremely difficult. You’d have to constantly monitor real-time metrics, adjust requests and limits, move workloads, resize nodes — it’s simply not feasible at scale.

This is where automation makes a huge difference.

Next slide, please.

Here, we’re looking at bin packing efficiency across different cluster sizes. And again, the numbers are extremely consistent.

Small clusters, medium clusters, large clusters — all of them waste roughly the same amount of resources. The inefficiency is not coming from cluster size; it’s coming from the way workloads are requested and allocated.

So regardless of how big your cluster is, manual bin packing is not going to solve the problem.

Another interesting aspect of bin packing is the impact of mixed workloads. When you mix workloads with different utilization profiles — for example, spiky workloads and stable workloads — you would think that the cluster should become more efficient.

But our data shows the opposite. Mixed workloads actually make bin packing harder because their usage patterns are unpredictable.

And without automation, Kubernetes schedulers are simply not equipped to handle that level of variability efficiently.

Next slide, please.

Now let’s talk about overprovisioning in more detail.

As you see here, roughly 70% of CPU and memory is consistently unused across clusters. But what’s even more interesting is the distribution of overprovisioning.

A large portion of workloads request 5–10× more resources than they actually need. This isn’t just a small miscalculation — this is massive overestimation.

And the root cause is not incompetence — it’s uncertainty. Engineers are afraid of underprovisioning and causing performance issues, so they overprovision “just in case.”

But when everyone overprovisions, the cluster becomes massively oversized.

Another important point: overprovisioning happens not only at the workload level but also at the node level.

Node types are often chosen without considering actual workload profiles. For example, choosing a memory-optimized node because “it feels safer,” even if the workload doesn’t need that much memory.

And again, this translates into cost.

Next slide, please.

Now let’s turn to pricing. This chart shows the hourly cost of nodes across regions, and as you can see, pricing varies dramatically.

Some regions can be 2× more expensive than others for the exact same node type. And when you’re running hundreds of nodes, this price difference becomes enormous.

And the problem is that prices change frequently. What was the cheapest region last year may no longer be the cheapest today.

So without automation, it’s extremely difficult to keep up with regional pricing changes.

And finally, let’s talk about the impact of automation.

One of the biggest takeaways from this year’s report is that the organizations with the highest savings are the ones that use automation to manage their clusters — not manual tuning, not dashboards, not rightsizing scripts, but full automation.

Automation allows organizations to make real-time decisions based on actual usage, pricing, and availability. And this is simply not possible manually.

So with that, I’m going to hand it back to Joe.

Joe Roualdes: Thank you, Laurent. Super insightful, as always. I think this year’s report is one of the most interesting we’ve done so far, especially with the new GPU availability and pricing data.

So now let’s transition into a discussion with Tracy and Laurent. I’ve got a few questions lined up based on what we saw in the report and what Tracy covered from the Forrester perspective.

And as a reminder, feel free to keep dropping questions into the chat. We’ll try to answer as many as we can live.

First question — and this one is for both of you. The report shows that Kubernetes waste has remained roughly the same for three years in a row. Tracy, you mentioned your survey respondents estimated around 60% waste, which aligns closely with Laurent’s real cluster data.

Why do you think this waste problem is so persistent? Why hasn’t it improved even though more companies are using Kubernetes, more tools have come out, and teams have more experience?

Tracy Woo: Yeah, that’s a great question. I think it comes down to a couple of things.

First, Kubernetes itself is complex. It’s incredibly powerful, but it’s also incredibly flexible. And with that flexibility comes a lot of opportunities to overprovision — especially when engineers aren’t sure what the “right” amount of resources should be.

Second, teams are busy. They don’t have the time to constantly tune requests and limits. And even if they do it once, usage patterns change — new releases, traffic spikes, seasonality, new services — so the tuning becomes outdated quickly.

And third, many companies just don’t have the right visibility. They don’t have granular historical data or tools that highlight where the waste is. Or if they do have dashboards, they don’t have the time to act on them.

So I think it’s a mix of complexity, lack of time, and lack of visibility.

Laurent Gil: Yeah, I agree with all of that. I would add one more factor: fear.

Engineers are afraid of underprovisioning. They are afraid of introducing reliability issues. If you choose too few resources and something breaks in production, you’ll hear about it — immediately.

But if you overprovision, nobody gets paged. Nobody gets in trouble. So the incentives are misaligned.

Overprovisioning feels “safe,” even though it costs a lot of money. And unless you change the system — with automation or guardrails — people will always choose the safer option.

Joe Roualdes: Yeah, that makes total sense.

In the report, one of the really striking findings is that even large organizations with mature platform engineering teams — teams that have been running Kubernetes for years — still see the same waste levels as smaller companies.

Tracy, is that consistent with what you’ve seen?

Tracy Woo: Yes, completely. In fact, sometimes the most mature organizations are the ones with the most waste.

Because mature organizations have more workloads, more clusters, more teams, more complexity. The scale amplifies the inefficiency.

And larger companies also have more organizational silos. So even if one team is doing a great job with optimization, another team might be overprovisioning massively.

So the waste averages out — and it’s still high.

Laurent Gil: Yeah, exactly. We see this all the time. When an organization has 200 microservices, and each service overprovisions by just a little bit, the total cluster is massively oversized.

And you can’t fix that manually — it’s impossible. You need automation to handle this at scale.

Joe Roualdes: Awesome. Let’s move on to the next question.

One of the new sections in this year’s report is on GPU availability and pricing. GPUs have obviously become a huge topic, especially with the rise of AI workloads.

Laurent, you mentioned earlier that availability in some regions for H100s is as low as 2%. That’s crazy. So the question is: what should companies do when they need GPUs but can’t get access to them?

Laurent Gil: Yeah, this is becoming a massive issue. We’re seeing demand for GPUs skyrocketing, and supply just isn’t keeping up.

So, number one: companies need to be extremely flexible with regions. If you only look at one or two regions, you may find that GPUs simply aren’t available.

But if you expand your search to more regions and zones, your chances increase dramatically.

And number two: companies have to be flexible with GPU types. A lot of people want H100s — for good reasons — but in many cases, an A100 or L40S might work just fine.

If you’re too rigid, you may find yourself waiting months for hardware. If you’re flexible, you might be able to run your workloads today.

And number three: automation is critical. You can’t manually check availability across multiple regions and zones. Prices change daily. Availability changes daily.

Automation that scans regions, finds available GPUs, and automatically provisions the best option is becoming essential.

Tracy Woo: Yeah, I completely agree. And I also think that teams need to rethink how they plan for GPU usage.

For example, some organizations are trying to get GPUs for long-term, steady workloads — but maybe those workloads don’t actually need the most expensive hardware.

Or maybe they can use smaller GPUs, or burst into the cloud only when they need extra capacity.

The other thing is prioritization. A lot of companies want GPUs for everything right now — model training, inference, analytics, even some workloads that don’t necessarily require accelerators.

Teams need to prioritize which workloads actually benefit from GPUs and which ones don’t.

Otherwise, they end up fighting for very scarce resources.

Joe Roualdes: Yeah, that makes a ton of sense.

Let’s go to the next question. This one is for Tracy.

There’s been a lot of discussion lately about platform engineering. Some people think it’s the solution to cloud complexity and to Kubernetes waste.

Do you think platform engineering is enough? Can a strong platform engineering team solve overprovisioning and cost inefficiencies on its own?

Tracy Woo: I think platform engineering is incredibly important — and it’s becoming more important every year.

Platform teams are essential for establishing best practices, building internal tooling, improving developer experience, and ensuring consistency across the organization.

But I don’t think platform engineering alone is the answer to the waste problem.

The reason is that platform teams are still dealing with the same challenges as everyone else — complexity, limited time, limited visibility, and constantly changing usage patterns.

Platform engineering can provide frameworks and guardrails, but it can’t manually fine-tune thousands of workloads.

That’s where automation comes in. Automation can operate at a scale that humans simply cannot.

Laurent Gil: Yeah, I agree. Platform engineering is critical — but it’s not enough.

Some organizations think that if they hire really smart DevOps engineers and build a strong platform team, they’ll be able to optimize everything manually.

But the reality is that no human team can monitor CPU and memory usage in real time across thousands of containers and make adjustments every minute.

It’s just not possible manually. You need automation to do that.

Joe Roualdes: Awesome. Let’s keep going.

The next question is about forecasting. Tracy, you mentioned earlier that organizations struggle to forecast their Kubernetes and cloud spend.

Why is forecasting so hard in Kubernetes environments? And what can teams do to improve it?

Tracy Woo: Forecasting is hard because usage patterns are unpredictable. Services scale up and down. Teams launch new features. Traffic changes. Seasonality impacts workloads.

And Kubernetes adds another layer of complexity because it’s dynamic by design. Workloads move. Autoscalers adjust. Pods get rescheduled.

So your cost model is constantly changing.

And even if you have great historical data, it won’t necessarily predict future usage. For example, if a team ships a new AI feature or doubles their traffic overnight, your forecast becomes irrelevant.

So forecasting isn’t just a math problem — it’s an organizational challenge as well.

Laurent Gil: Yeah, totally agree.

I would add that forecasting is also hard because most companies don’t even know how much they are wasting today.

If 70% of your resources are unused, it’s impossible to predict your future costs because you’re forecasting based on inflated numbers.

If you fix the waste problem first — by rightsizing or using automation — then forecasting becomes much easier because you’re forecasting based on reality, not inflated requests.

Joe Roualdes: That’s a great point.

Let’s go to the next question, and this one comes from the audience.

“How do I convince my leadership to invest in Kubernetes cost optimization?”

Laurent, maybe you can take this one first.

Laurent Gil: Yeah, this is a really good question — and one we hear a lot.

What we see is that leadership responds to numbers, not dashboards.

If you go to your CFO and say, “We think we have waste,” that’s not compelling. But if you go to them with a detailed analysis that shows, for example, “We are wasting $2 million a year, and here’s the data,” that’s a very different conversation.

So the first step is to measure your waste. Once you quantify it, the business case becomes obvious.

And the second step is to demonstrate that savings can be realized quickly. Leadership doesn’t want to hear about 18-month projects. They want impact in weeks or months.

Automation makes that possible. You can activate automation and see results almost immediately.

Tracy Woo: Yeah, completely agree. I also think it’s important to frame cost optimization as a way to support innovation — not just a way to cut spending.

If you can save money in one part of your cloud infrastructure, that frees up budget for innovation — for example, new AI initiatives.

And leadership loves hearing that optimization helps fund growth and innovation.

Joe Roualdes: I love that perspective.

All right, next question — and this is a technical one that came up multiple times in the chat.

“What’s the difference between rightsizing and automation? Aren’t they the same thing?”

Laurent Gil: Great question. Rightsizing is part of automation — but it’s not the whole thing.

Rightsizing means adjusting requests and limits to better match actual usage. And that’s important.

But automation goes much further. Automation chooses the best node types, the best regions, the best pricing options. It does real-time bin packing. It handles Spot interruptions. It provisions and deprovisions nodes automatically.

So rightsizing is one piece of the puzzle, but it doesn’t solve all the inefficiencies by itself.

Tracy Woo: Exactly. Rightsizing alone still leaves a lot of waste because workloads change all the time.

Automation adjusts continuously. Humans can’t do that. Scripts can’t do that. Custom tools usually can’t do that at scale either.

So automation is the evolution of rightsizing — it’s rightsizing plus everything else needed to keep clusters efficient 24/7.

Joe Roualdes: Awesome. Let’s take another audience question.

This one says: “We’ve tried tools like dashboards, rightsizing recommendations, and custom scripts, but the waste still doesn’t go down. Why?”

Laurent, do you want to start?

Laurent Gil: Sure. The short answer is: dashboards and scripts don’t take action — they just show you information.

You can have all the charts in the world, but unless someone is actually doing something about it — updating requests, changing node types, moving workloads — nothing will change.

And the challenge is that no engineer has the time to manually fix everything. Even if they fix it once, it becomes outdated quickly because workloads change.

So dashboards are great for visibility, but they don’t solve waste by themselves.

Tracy Woo: Yeah, completely agree. Dashboards can highlight problems, but they don’t solve root causes.

And manual rightsizing doesn’t scale. It might work for a few services, but as soon as you have dozens or hundreds of microservices, you’re back to square one.

Another issue is that recommendations often sit in a backlog. Engineers want to fix things, but feature work always takes priority. So the recommendations stay untouched.

Automation actually takes the action on your behalf — that’s the key difference.

Joe Roualdes: Fantastic insights from both of you.

All right, next audience question — and this one is about Spot instances.

“Is it safe to run production workloads on Spot instances?”

Laurent Gil: The answer is yes — but only if you have automation and fallback strategies in place.

Spot instances are great for saving money, but they are unpredictable. They can be interrupted at any time, sometimes with just a few minutes’ notice.

If you run production workloads on Spot without any automation, you are taking a big risk.

But if you have automation that handles interruptions — by immediately replacing reclaimed nodes with on-demand nodes or Spot instances from different families — then it becomes safe.

Tracy Woo: Yeah, and to add to that — we’re seeing more and more companies successfully running production on Spot.

The key is reliability. You need systems that detect interruptions instantly and react instantly.

Humans can’t do that manually. Automation can.

Joe Roualdes: Perfect. All right, moving on.

Another question we got is: “If overprovisioning has stayed the same for years, does that mean Kubernetes is fundamentally inefficient?”

Laurent?

Laurent Gil: I wouldn’t say Kubernetes is inefficient. Kubernetes is extremely powerful — but it’s also extremely flexible.

The inefficiency comes from the way people use it. Kubernetes gives engineers a lot of freedom, but that freedom makes it easy to overprovision.

And Kubernetes was never designed to optimize cost automatically. It was designed for scalability, reliability, and flexibility.

So the inefficiency is not a flaw — it’s a result of how teams request resources and how clusters are configured.

Tracy Woo: Exactly. Kubernetes solves many problems — orchestration, resiliency, scalability — but cost optimization is not one of them.

That’s why tooling and automation around Kubernetes have become so important.

Joe Roualdes: Amazing. Let’s go to the next one.

This one says: “Will AI make Kubernetes optimization easier or harder?”

Tracy, want to take this one?

Tracy Woo: I think AI will make some parts easier and some parts harder.

AI can definitely help analyze patterns, detect anomalies, and predict usage. Those things will get better with AI.

But AI workloads themselves are highly resource intensive. They require GPUs, they create spiky usage, and they introduce new scheduling challenges.

So AI adds both opportunities and complexity. It will help automate optimization, but it will also increase pressure on infrastructure.

Laurent Gil: Yeah, exactly. AI will help — especially with prediction.

But AI won’t magically solve the underlying problems of waste unless it’s paired with automation that takes action.

You still need something that can make decisions and optimize clusters in real time.

Joe Roualdes: Great answer from both of you.

All right, next audience question:

“Is Kubernetes cost optimization only for large companies? Or do small companies benefit too?”

Laurent Gil: Small companies benefit a lot.

In fact, sometimes smaller companies see the fastest ROI because they can move quickly and adopt automation without internal bureaucracy.

And Kubernetes waste affects everyone — small, medium, and large. The data shows that the waste percentage is almost identical regardless of organization size.

Small companies may spend less in absolute terms, but proportionally, they waste just as much.

Tracy Woo: Yeah, completely agree. Also, smaller companies usually have limited engineering resources.

So automation helps them focus on building features rather than tuning infrastructure.

And the savings they get can be significant relative to their budgets.

Joe Roualdes: Perfect. Let’s do one more audience question before we wrap up.

“What are the first steps a company should take if they want to reduce Kubernetes waste?”

Laurent Gil: Step one: measure your waste. You can’t fix what you can’t see.

Step two: start small. Pick one or two clusters, apply automation or rightsizing, and see the results.

Step three: expand gradually. Once you see savings and stability, you can scale automation across your organization.

Tracy Woo: And I would add: get buy-in early.

Cost optimization works best when platform teams, engineering teams, and finance teams are aligned.

If everyone understands why optimization matters — and how it helps — it becomes much easier to implement changes.

Joe Roualdes: Awesome. Thank you both. This has been incredibly insightful.

All right, before we wrap up, we’re going to launch one more poll.

“What is your biggest challenge with Kubernetes optimization today?”

The options are:

Lack of tooling
Lack of time
Lack of visibility
Complexity
Other

Go ahead and make your selection. We’ll give you about 30 seconds.

All right, let’s take a look.

It looks like the top answer is “Lack of time,” followed closely by “Complexity.”

Not surprising at all — Kubernetes optimization is time-consuming and complex, and that’s exactly why automation has such a big impact.

Joe Roualdes: All right, so we’re almost at the end. Before we wrap up, I want to thank Tracy and Laurent for joining today and for sharing their insights.

Tracy, thank you so much for contributing your research and perspective — it’s incredibly valuable to see how industry-wide survey data aligns with real cluster data.

Tracy Woo: Thank you so much for having me. It was great to be here.

Laurent Gil: Yes, thank you. This was super fun, and I hope it was helpful for everyone who joined.

Joe Roualdes: Absolutely. And one last reminder — you can download the full Kubernetes Cost Benchmark Report using the link in the chat.

The report includes everything we discussed today — and a lot more.

And if you’re interested in trying Cast AI’s cost optimization platform, you can visit cast.ai to get started. You can sign up for free, connect a cluster, and get a full breakdown of your waste and potential savings.

All right, thank you again to everyone for joining. Have a wonderful rest of your day!

Webinar ends.

Speakers

Tracy Woo

Principal Analyst,
Forrester

Laurent Gil

President & Co-Founder,
Cast AI

Key Takeaways

Kubernetes waste is systemic and persistent. For the third consecutive year, cluster data from over 2,000 organizations reveals that approximately 70% of requested CPU and memory resources are never utilized, across AWS, Azure, and GCP, including both production and non-production environments, and across all cluster sizes.
Perception matches reality, but action lags. Forrester’s survey respondents estimated about 60% Kubernetes waste and reported that Kubernetes already accounts for 50% or more of many cloud bills. Most teams only realize the true scale of waste once they start measuring real utilization.
Overprovisioning is driven by fear, not incompetence. Engineers routinely request 5-10 times more resources than workloads actually need because the risk of underprovisioning and breaking production feels worse than silently overspending. Incentives encourage teams to “play it safe,” resulting in waste accumulation.
Automation is the main differentiator in cost outcomes. The organizations achieving the largest and fastest savings are those that utilize full automation, not just static rightsizing. Automation can:
1. Continuously adjust requests and limits based on real usage
2. Choose optimal node types, regions, and pricing options
3. Handle Spot interruptions automatically
4. Improve bin-packing and scale clusters up and down in real time

GPU availability and pricing are now a constraint, not a detail. High-demand GPUs, such as H100s, are often unavailable in many regions, with single-digit availability, and prices can vary two to three times between regions for the same SKU. Flexible region and GPU selection, plus automation that can react to changing availability, is becoming essential for AI workloads.

Why do Kubernetes workloads lead to higher cloud spend?

Most workloads are provisioned with far more CPU and memory than they actually use. Engineers size conservatively to avoid production issues, and Kubernetes’s flexibility makes it easy for requests to drift upward as traffic and deployments change. Manual tuning can’t keep up, so clusters accumulate unused capacity.

Real data shows that roughly 70% of requested resources go unutilized across organizations of all sizes, which drives cloud spend higher than expected. To address this, many teams turn to Cast AI for continuous, automated optimization that keeps clusters right-sized without requiring additional work from engineers.

Why does Kubernetes overprovisioning happen?

Mainly fear. Underprovisioning leads to outages and alerts. Overprovisioning keeps systems quiet, even if it wastes money. With no guardrails or real-time automation, teams default to “just in case” sizing. To address this, companies can use Cast AI’s automated rightsizing and scheduling tools, which continuously adjust CPU, memory, and node choices based on real utilization – eliminating guesswork and preventing overprovisioning at scale.

Does platform engineering solve Kubernetes waste?

Platform engineering helps, but it’s not enough on its own. Even mature teams with strong internal tooling see similar waste levels. No human team can continuously tune hundreds of workloads or adapt to shifting usage patterns in real time.

Is it risky to run production workloads on Spot Instances?

Only if you rely on manual processes. With automated fallback and rapid interruption handling, Spot can be safe and cost-efficient. Without automation, the risk of interruption is significant.

Why is GPU availability such a challenge right now?

The demand for GPUs like the H100 far exceeds the supply. Some regions show single-digit availability. Prices also vary up to 2 or 3 times between regions. Flexibility in regions and GPU types, combined with automation, is becoming essential.

Is Kubernetes optimization only useful for large companies?

Not at all. Waste percentages are nearly identical across small, medium, and large organizations. Smaller teams often benefit fastest because they can adopt automation quickly and free up engineers to focus on product development.