GKE Cost Optimization: 10 Steps For A Lower Cloud Bill

If you’ve been running your workloads on Google Cloud’s managed Kubernetes service (GKE), you probably know how hard it is to forecast, monitor, and manage costs. GKE cost optimization initiatives […]

Laurent Gil Avatar
GKE cost optimization

If you’ve been running your workloads on Google Cloud’s managed Kubernetes service (GKE), you probably know how hard it is to forecast, monitor, and manage costs. GKE cost optimization initiatives only work if you combine Kubernetes know-how with a solid understanding of the cloud provider.

If you’re looking for a complete guide to GKE cost optimization, keep reading and get the scoop on the latest best practices from the Kubernetes ecosystem.

GKE pricing: a quick guide

Pay-as-you-go

How it works: In this model, you’re only charged for the resources that you use. For example, Google Cloud will add every hour of compute capacity used by your team to the final monthly bill. 

There are no long-term binding contracts or upfront payments, so you’re not overcommitting. Plus, you have the freedom to increase or reduce your usage just in time. 

Limitations:

  • It’s the most expensive pricing model, so you risk overrunning your budget if you set no control over the scale of resources your team can burn each month.
  • Flexible pay-as-you-go VMs work well for unpredictable workloads that experience fluctuating traffic spikes. Otherwise, it’s best to look into alternatives.

Committed Use Discounts

How it works: Committed Use Discounts (CUDs) compete with AWS Reserved Instances but without requiring advanced payments, You can choose from two types of Committed Use Discounts: resource- and spend-based.

Resource-based CUDs offer a discount if you commit to using a minimum level of Compute Engine resources in a specific region, targeting predictable and steady-state workloads. Moreover, CUD sharing lets you share the discount across all projects tied to your billing account. 

Spend-based CUDs, on the other hand, deliver a discount to those who commit to spending a minimum amount ($/hour) for a Google Cloud product or service. This offering was designed to help companies generate predictable spend measured in $/hr of equivalent on-demand spend. This works similarly to AWS Savings Plans.

Limitations:

  • In the resource-based scenario, CUD will ask you to commit to a specific instance or family.
  • In the spend-based CUD, you risk committing to a level of spend for resources that your company might not need six months from now.  

In both examples, you run the risk of locking yourself in with the cloud vendor and committing to pay for resources that might make little sense for your company in 1 or 3 years. 

When your compute requirements change, you’ll have to either commit even more capacity or you’re stuck with unused capacity. Committed use discounts remove the flexibility and scalability that made you turn to the cloud in the first place.

Take a look here to learn more: GCP CUD: Are There Better Ways to Save Up on the Cloud?

Sustained Use Discounts

How it works: Sustained Use Discounts are automated discounts users get on incremental usage after running Compute Engine resources for a big part of a billing month. The longer you run these resources continuously, the bigger your potential discount on incremental usage.  

Spot virtual machines

How it works: In this cost-effective pricing model, you bid on resources Google Cloud isn’t using and can save between 60-91%. But the provider can pull the plug with a 30-second notice, so you need to have a strategy and tooling for dealing with such interruptions. 

Limitations:

  • Make sure that you pick spot VMs for workloads that can handle interruptions – and for which you have a solution in place. 

10 steps for GKE cost optimization

Pick the right VM type and size

1. Define your workload’s requirements

Your first step is to understand how much capacity your application needs across the following compute dimensions: 

  • CPU count and architecture, 
  • memory, 
  • storage, 
  • network. 

You need to make sure that the VM’s size can support your needs. See an affordable VM? Consider what will happen if you start running a memory-intensive workload on it and end up with performance issues affecting your brand and customers. 

Consider your use case as well. For example, if you’re looking to train a machine learning model, it’s smarter to choose a GPU-based virtual machine because training models on it are much faster. 

2. Choose the best VM type for the job

Google Cloud offers various VM types matching a wide range of use cases, with entirely different combinations of CPU, memory, storage, and networking capacity. And each type comes in one or more sizes, so you can scale your resources easily.

But providers roll out different computers for their VMs. The chips in those machines may have different performance characteristics. So, you may easily end up picking a type with a strong performance that you don’t actually need (and you won’t even know it!).

Understanding and calculating all of this is hard. Google Cloud has four machine families, each with multiple machine series and types. Choosing the right one is like combing through a haystack to find that one needle you need.

The best way to verify a machine’s capabilities is benchmarking where you drop the same workload on various machine types and check their performance. We actually did that, take a look here.

3. Check your storage transfer limitations

Data storage is a key GKE cost optimization aspect since each application comes with its unique storage needs. Verify that the machine you choose has the storage throughput your workloads need. 

Also, avoid expensive drive options such as premium SSD unless you’re planning to use them to the fullest.

Use spot virtual machines

4. Check if your workload is spot-ready

Spot VMs offer an amazing opportunity to save up on your GKE bill – even by 91% off the pay-as-you-go pricing! 

But before you move all your workloads to spot, you need to develop a spot strategy and check if your workload can run on spot.

Here are a few questions you need to ask when analyzing your workload:

  • How much time does it need to finish the job? 
  • Is it mission- and/or time-critical?
  • Can it handle interruptions gracefully? 
  • Is it tightly coupled between nodes? 
  • What solution are you going to use to move your workload when Google pulls the plug? 

5. Choose your spot VMs

When picking a spot VM, go for the slightly less popular ones. It’s simple – they’re less likely to get interrupted. 

Once you pick one, check its frequency of interruption. This is the rate at which this instance reclaimed capacity during the trailing month. 

6. Use groups

To boost your chances of getting the spot machines you need, set up groups of spot instances to request multiple machine types at the same time (managed instance group). Managed instance groups create or add new spot VMs when additional resources are available. 

Sounds great? But prepare for a massive configuration, setup, and maintenance effort if you choose to manage spot VMs manually. 

Take advantage of autoscaling

Tracking which projects or teams generate GKE costs is hard. So is knowing whether you’re getting any savings from your cluster. But there’s one tactic that helps: autoscaling. 

The tighter your Kubernetes scaling mechanisms are configured, the lower the waste and costs of running your application. Use Kubernetes autoscaling mechanisms to drive your GKE costs down.

7. Use Kubernetes autoscaling mechanisms (HPA, VPA, Cluster Autoscaler)

Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler (HPA) adds or removes pod replicas automatically when the demand on your workload changes. It’s a great solution for scaling both stateless and stateful applications. Use HPA with cluster autoscaling to reduce the number of active nodes when the amount of pods shrinks.

Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler (VPA) increases and decreases the CPU and memory resource requests of pod containers to align the allocated to the actual cluster usage better. It’s a good practice to use both VPA and HPA if your HPA configuration doesn’t use CPU or memory to identify scaling targets. VPA works well for workloads that experience temporary high utilization.

Cluster Autoscaler

Cluster Autoscaler changes the number of nodes in a cluster. It’s an essential tool for reducing GKE costs by dynamically scaling the number of nodes to fit the current cluster utilization. This is especially valid for workloads designed to scale and meet such changing demand. 

Read this for a more detailed guide to these three autoscaling mechanisms: Guide to Kubernetes autoscaling for cloud cost optimization

8. Make sure that HPA and VPA policies don’t clash

The Vertical Pod Autoscaler automatically adjusts the number of pods and limits based on a target average CPU utilization, reducing overhead and achieving cost savings. The Horizontal Pod Autoscaler aims to scale out more than up.

So, double-check that the VPA and HPA policies aren’t interfering with each other. Review your binning and packing density settings when designing clusters for business- or purpose-class tier of service.

9. Consider instance weighted scores

When autoscaling, use instance weighting to determine how much of your chosen resource pool you want to dedicate to a particular workload. This is how you ensure that the machines you create are best suited for the work at hand.

10. Reduce costs further with a mixed-instance strategy

A mixed-instance strategy can help you achieve great availability and performance at a reasonable cost. You choose from various instance types, some of which may be cheaper and just good enough for lower-throughput or low-latency workloads.

Mixing instances in this way could potentially bring you to cost savings because each node requires Kubernetes to be installed on it, which adds a little overhead. 

But how do you scale mixed instances? In a mixed-instance situation, every instance uses a different type of resource. So, when you scale instances in autoscaling groups and use metrics like CPU and network utilization, you might get inconsistent metrics from different nodes. 

To avoid these inconsistencies, use the Cluster Autoscaler to create an autoscaler configuration based on custom metrics. Also, make sure all your nodes share the same capacity for CPU cores and memory.

Reduce your GKE costs using automation

Using these best practices is bound to make an impact on your next GKE bill. But manual cost management will only get you up until a certain point. It requires many hours of work, potentially leading to mistakes that compromise your availability or performance if you miscalculate something. 

That’s why many teams turn to automated GKE cost optimization solutions. They help save time and money by dramatically reducing the amount of work required to manage cloud costs, giving you an opportunity to enrich your Kubernetes applications with new features instead of micromanaging the cloud infrastructure.

Connect your cluster to the CAST AI platform and run a free cost analysis to see a detailed cost breakdown and recommendations – or use automation to do the job for you.

CAST AI clients save an average of 63% on their Kubernetes bills

Connect your cluster and see your costs in 5 min, no credit card required.

CAST AI Blog GKE Cost Optimization: 10 Steps For A Lower Cloud Bill