All CAST AI plans are now available on the Google Cloud Marketplace ➡️ Read the announcement

read free online

The 2022 Playbook of
Cloud Pricing and Cost Management

Discover 4 fool-proof tactics to slash your SaaS operational costs

Optimizing Cloud Costs Makes the Biggest Difference

In the SaaS world, gross margins are the Holy Grail. Your ability to reach that dream 70-80% margin is what brings your valuation up to billions of dollars. 

In the past, your CFO would have to agree to every CAPEX item – for example, colocation space in data centers. Now, engineers can set up infrastructure on demand without checking with anyone.

The developer might add a little more than needed, just enough to sleep well at night. Then the SRE does the same and lo and behold you’re dealing with 40% cloud waste.

Many SaaS companies reserve capacity in advance to benefit from discounts cloud providers offer for this commitment. 

But what happens to a company like Snapchat that – having committed to spend $1.1 billion on AWS services – only to see its stock nosedive by 43% due to underwhelming usage stats (and revenue)?

Companies are no longer reserving capacity upfront

More and more companies are starting to realize that reserving capacity isn’t a feasible solution to mounting cloud bills. In last year’s Flexera State of the Cloud report, 52% of AWS users chose reserved instances and 44% savings plans. This year, these numbers are down to 36% and 31%, respectively.

Snapchat’s executive team probably expected the company to grow steadily, but the pandemic and other events causing the current economic downturn proved them otherwise. 

There are so many other things companies can do to handle their cloud expenses – automated cost optimization being a chief alternative. 

This playbook gives you all the info you need to make the best decision for your business and optimize your operational costs to win the market.

Cloud Pricing: Everything You Need to Know

If you’re considering moving to the public cloud or optimizing the choice for your next project, picking between AWS, Azure, and Google Cloud Platform can be a daunting task. They all offer flexible compute, storage, and networking combined with everything engineers love about the cloud: self-service, instant provisioning, and autoscaling. 

But each provider differs in key areas that may have a massive impact on your cloud bill. Selecting one vendor over another comes down to knowing what your teams, applications, and workloads need. You need to fully understand your requirements before exploring the cloud landscape.

Cloud Landscape Today: What Are The Unique Strengths of AWS, Azure, and Google Cloud Platform?

AWS

Companies choose to build their applications on AWS because of its breadth and depth of services. The rich array of tools, including databases, analytics, management, IoT, security, and enterprise applications, makes AWS the right solution for many teams. No wonder AWS has the most significant slice of the cloud market.

Azure

According to the Flexera 2022 State of the Cloud Report, Azure has slightly surpassed AWS in the percentage of enterprises using it (80% Azure vs. 77% AWS). Azure also offers various services for enterprises, and Microsoft’s long standing relationship with this segment makes it an easy choice for some customers. Azure, Office 365, and Microsoft Teams enable organizations to provide employees with enterprise software while also leveraging cloud computing resources. 

Google Cloud Platform

Azure and AWS have strong machine learning capabilities. But Google Cloud Platform stands out thanks to its almost limitless internal research and expertise – the magic that has been powering the search engine giant throughout the years. 

What makes GCP different is its role in developing various open source technologies. This goes especially for containers and Google’s central role in building Kubernetes for orchestration and Istio service mesh, today practically industry-standard technologies. 

Google’s culture of innovation lends itself really well to startups and companies that prioritize such approaches and technologies.

Billing in AWS vs. Azure vs. Google Cloud Platform

In addition to per-minute billing, AWS, Azure, and Google Cloud support per-second billing for various services. AWS first introduced per-second billing in 2017 for EC2 Linux-based instances and EBS volumes – but today, it applies to many other services. 

Per-second billing works with a minimum 60-second limit in AWS. Azure allows per-second charges on its cloud platform, but this billing model isn’t available for all instances – mostly container-based ones. 

Google Cloud followed AWS in the introduction of per-second billing and now offers it for more than just instances based on Linux. This form of billing applies to all VM-based instances.

Note that every service in the bill you receive each month uses its defined billing metric. For example, some services in the AWS Simple Storage Service charge by the number of requests, while others use GB. That’s why making sense of the cloud bill is such an overwhelming task, no matter which provider you use.

Buying Compute Resources: About Chips and Processors 

Providers roll out virtual machines with different hardware and performance characteristics. As a result, you might end up with an instance type that provides strong (and expensive!) performance your teams don’t actually need. 

Benchmarking is one way to see what you’re really paying for: you can run the same workload on each machine and check its performance characteristics. 

This approach might help you discover something interesting, just like we did. The chart below shows CPU operation in AWS (t2.2xlarge with eight virtual cores) at varying times after several idle periods. Would you expect such unpredictable CPU behavior within a single cloud provider?

Source: CAST AI

The 2022 Cloud Report from CockroachLabs used this method to evaluate AWS, Azure, and Google Cloud machines. One of their conclusions was that Google performs better than AWS and Azure. GCP instances occupied 6 out of 10 spots in the top 10 instances in price-for-performance.

Guide to Cloud Pricing Models

All three major cloud providers – AWS, Google Cloud Platform, and Microsoft Azure – offer the following pricing models that have their pros and cons.

  • On-demand instances – no need to worry about long-term binding contracts or upfront payments. But this flexibility comes with a high price tag. 
  • Reserved instances – buy capacity upfront in a specific availability zone with a large discount, committing to a specific instance or family that can’t be changed later.
  • Savings plans – commit to using a given amount of compute power per hour (not specific instance types and configurations). Anything extra is billed on the on-demand rate.
  • Spot instances/Preemptible instances – bidding on spare compute is a smart move but AWS can pull the plug on your instance any time and give you just 2 minutes to prepare for it, so you need a strategy in place for this. 
  • Dedicated host – a physical server that is fully dedicated to you, a good match for applications that have to achieve compliance by not sharing hardware with other tenants. 

Cloud Storage Pricing Comparison: AWS vs. Azure vs. Google Cloud Platform

How do these major cloud providers differ in terms of storage pricing? 

Here’s a comparison of prices in similar regions: AWS US East (Northern Virginia), Azure East US, and Northern Virginia (us-east4) in Google Cloud Platform.

It’s clear that these three cloud giants compete closely with one another and have set similar price ranges for storage services, with Azure standing out as the most cost-effective alternative. However, be sure to check out other cost dimensions such as data transfer or operations charges before picking the storage service.

Also, pay attention to the provider’s approach to pricing changes.

Google Cloud Platform recently introduced significant price increases across various core services around storage. Price hikes may affect other services and cloud providers considering the current challenges like inflation rates running high around the world and supply chain issues. 

Compute Pricing Comparison: AWS vs. Azure vs. Google Cloud Platform

Compute often ends up racking up a cloud bill, but it also presents the greatest opportunity for cost optimization. We prepared this case study to show the incredible impact optimizing compute costs can have on your bottom line.

Comparing cloud pricing – our example setup

To understand the pricing differences better, we’re going to compare virtual machines within similar regions and with the same operating system.

The services analyzed are:

  • AWS – Amazon EC2.
  • Azure – Virtual Machines.
  • Google Cloud Platform – Compute Engine. 

Our example setup:

  • Region: AWS US East (Northern Virginia), Azure East US, and Northern Virginia (us-east4) in Google Cloud Platform.
  • Operating System: Linux. 
  • vCPUs: 4.

Types of instances/VMs we will analyze:

  • General purpose.
  • Compute optimized.

We picked instances with four vCPUs and similar RAM (the only exception is the compute optimized machine from Google Cloud Platform):

Cloud providerInstance typevCPURAM (GB)
AWSGeneral purposet4g.xlarge416
AWS compute optimizedc6a.xlarge48
AzureGeneral purposeB4ms416
Azure compute optimizedF4s v248
Google Cloud PlatformGeneral purposee2-standard-4416
Google Cloud Platform compute optimizedc2-standard-4416

AWS vs. Azure vs. Google Cloud Platform: Comparing On-demand Pricing 

Here’s the hourly On-Demand pricing of each of these virtual machines across AWS, Azure, and Google Cloud Platform.

Takeaways:

  • While Azure is the most expensive choice for general purpose instances, it’s the most cost-effective alternative to compute optimized instances.
  • Google Cloud Platform offers the highest price for compute optimized instances, but this machine has double the RAM of alternatives from AWS and Azure. 

Now that you have a firm grasp on cloud pricing, let’s take a look at battle-tested tactics to reduce cloud bills.

4 Cloud Cost Management and Optimization Tactics

Avoid reserving cloud resources upfront 

Reserving capacity for one or three years in advance at a much cheaper rate seems like a compelling option. Why not buy capacity in advance when you know that you’ll be using the service anyway?

Like anything else in the world of the cloud, forecasting your capacity demand is tricky. Back in 2017, Snapchat committed to spending $1.1 billion over 5 years with AWS and $2 billion with Google Cloud. Recently, the company admitted that committing to that much was a mistake…

Here are a few things you should know about reserving capacity:

  • A reserved instance works by “use it or lose it” – every hour that it sits idle is an hour lost to your team (with any financial benefits you might have secured).
  • When you commit to specific resources or levels of consumption, you assume that your needs won’t change throughout the contract’s duration. But even one year of commitment is an eternity in the cloud. And when your requirements go beyond what you reserved, you’ll have to pay the price – just like Pinterest did. 
  • When confronted with a new issue, your team may be forced to commit to even more resources. Or you’ll find yourself with underutilized capacity that you’ve already paid for. In both scenarios, you’re on the losing end of the game.
  • By entering into this type of contract with a cloud service provider, you risk vendor lock-in – i.e. becoming dependent on that provider (and whatever changes they introduce) for the next year or three. 
  • Selecting optimal resources for reservation is complex (just check out point 3 above in this article).

Pay attention to egress costs

Moving data between cloud vendors or even regions/availability zones can be very costly, so it requires some planning. Because of high egress costs many companies would rather take the risk of going down in a single availability zone than use several availability zones for resilience.

You don’t want to end up like NASA, which uploaded 247 petabytes of data into AWS but forgot about the mounting egress costs. Vendors typically charge around 0.09 per GB to extract data, while importing data is free.

Keep an eye on egress costs and use a solution that allows you to take data out of the cloud easily and cost-efficiently.

Take advantage of spot instances

It’s smart to buy idle capacity from AWS and other large cloud providers because spot instances are up to 90% cheaper than on-demand ones. However, there is a catch: the vendor reserves the right to reclaim these resources at any moment. You need to make sure that your application is prepared for that before jumping on the spot bandwagon.

Here’s how to use spot instances:

1. Examine your workload to see if it’s ready for a spot instance

Can withstand interruptions? How long will it take to complete the job? Is this a mission-critical workload? These and other questions aid in the qualification of a workload for spot instances.

2. Examine the services of your cloud provider

It’s a good idea to look at less popular instances because they’re less likely to be interrupted and can operate for longer periods of time. Check the frequency of interruption of an instance before settling on it.

3. Now it’s time to bid

Set the highest amount you’re prepared to pay for your chosen spot instance. Note that it will 

only run as long as the market price meets your offer (or is lower). Setting the maximum price at the level of on-demand pricing is the rule of thumb here.

4. Manage spot instances in groups

That way, you’ll be able to request numerous instance types at once, increasing your chances of landing a spot instance.

To make all of the above work well you can automate it with a platform like CAST AI or spend a lot of time on manual configuration, setup, and maintenance tasks.

Automate as much as you can 

To gain control over their cloud expenses, companies apply various cost management and optimization solutions in tandem:

  1. Cost visibility and allocation – Using a variety of cost allocation, monitoring, and reporting tools, you can figure out where the expenses are coming from. Real-time cost monitoring is especially useful here since it instantly alerts you when you’re going over a set threshold. A computing operation left running on Azure resulted in an unanticipated cloud charge of over 500k for one of Adobe’s teams. One alert could have prevented this.
  2. Cost budgeting and forecasting – You can estimate how many resources your teams will need and plan your budget if you crunched enough historical data and have a fair idea of your future requirements. Sounds simple? It’s anything but – Snapchat’s story shows that really well.
  3. Legacy cost optimization solutions – This is where you combine all of the information you got in the first two points to create a complete picture of your cloud spend and discover potential candidates for improvement. Many solutions on the market can assist with that, like Cloudability or VMware’s CloudHeath. But most of the time, all they give you are static recommendations for engineers to implement manually.
  4. Automated, cloud native cost optimization – This is the most powerful solution for reducing cloud costs you can use. This type of optimization doesn’t require any extra work from teams and results in round-the-clock savings of 50% and more, even if you’ve been doing a great job optimizing manually. A fully autonomous and automated solution that can react quickly to changes in resource demand or pricing is the best approach here. 

Should we continue to rely on software engineers to do all the management and optimization tasks manually? Not with so many automation options at hand!

Cloud automation opens the doors to the greatest savings. As you can tell from the points above, manual cost optimization is a complex and time-consuming process. And regardless of the skill level of engineers, many of the cost optimization tasks are just a waste of time for skilled and creative engineers. 

Allocate, comprehend, analyze, and anticipate cloud expenses and you’ll see how hard that is. Then you need to make infrastructure adjustments, investigate pricing plans, spin up more instances, and do a variety of other tasks to create a cost-effective infrastructure.

Automation takes many of these tasks off your plate:

cloud native cloud cost optimization

Apart from getting rid of all the tasks above, an automated solution adds more value because it:

  • Selects the most cost-effective instance types and sizes to meet your application’s needs.
  • Automatically scales your cloud resources up and down to cope with demand spikes and drops.
  • Removes resources that aren’t in use to eliminate waste.
  • Makes use of spot instances and gracefully manages disruptions.
  • Automates storage and backups, security and compliance management, and changes to configurations and settings to help you save money in other areas.

Most importantly, an automated platform implements all of these modifications in real time, mastering the point-in-time nature of cloud cost optimization.

How a K8s Automation Platform Like CAST AI can Help

1. Auto-provisioning cloud resources

Choosing the right VM for the job is tough because you’re facing many different choices with unique parameters. How are you supposed to know which ones have the optimal cost vs. performance ratio?

You can delegate the tasks of rightsizing and autoscaling to an instance selection algorithm. It’s capable of choosing the best instance types that meet the requirements of your application whenever your cluster needs extra nodes. Your workloads will run at maximum performance and minimum cost.

Here’s how it works if you run your app in a managed Kubernetes solution that takes care of automated provisioning:

  1. At 15:41, the application starts experiencing a surge of traffic. The tool creates new pods to handle it, but they have no place to run. We need new CPU cores.
  2. Within 2 minutes, the solution adds a new 16-core node automatically.
  3. At 15:45, some more traffic appears in the application. The tool adds an extra 8-core node within one minute so that the application can handle the traffic.
  4. Once the traffic is gone, the solution instantly retires 2 nodes to avoid resource waste.

2. Mitigating cloud sprawl and avoiding waste

Cloud automation solutions can shut down unused instances and processes to reduce your cloud costs. And losing track of cloud-deployed instances is more common than you think:

Cloud automation solutions

Many teams face orphaned instances that have no ownership or battle shadow IT projects that come with poorly accounted for resources in the cloud. Such examples of zombie infrastructure are still activated and generate a monthly bill, but you get zero value from the assets.

Cloud automation can streamline the process of identifying zombie IT infrastructure and addressing the problem before it snowballs into a massive cloud bill at the end of the month.

3. Enabling Infrastructure as Code (IaC)

Establishing Infrastructure as Code (IaC) is another use case for cloud automation. In IaC, IT infrastructure is defined in configuration files and launched automatically in line with that configuration. These configuration files undergo the same processes as source code – they need to be kept in version control, managed, tested, and developed. 

With cloud automation, you can extend the control over nearly all infrastructure aspects to the cloud platform, allowing orchestration of more complex systems:

  • Cloud automation processes can draw from resource pools and define common configuration items (like VMs, containers, or virtual private networks). 
  • Next, they can load these application components and services into the configuration items.
  • Finally, they can be assembled to create an improved operational environment.
Start free today

You will get full access to cloud cost monitoring, reporting, and optimization insights to reduce your cluster cost immediately.

These Companies are Using Automation to Slash Their Cloud Expenses by 50%+

How Branch Saved Millions of Dollars on its Cloud Bill While Maintaining Reliability

Partnering with CAST AI has been a big success for Branch, saving us several millions of dollars per year in AWS Cloud compute costs for our Kubernetes clusters while maintaining our reliability SLAs.

Mark Weiler
Senior VP of Engineering at Branch

Branch partnered with CAST AI to unlock the ability to safely utilize Spot instances within their Kubernetes compute clusters and transition away from upfront reservations via Savings Plans.

The result for Branch was to eliminate the upfront spend of several million dollars per year on Savings Plans for their stateless EC2 compute workloads while saving millions of dollars of Cloud OpEx spend (over 25% of EC2 compute costs) by leveraging Spot instances safely via the CAST AI solution.

Cost efficiency, speed, and engineer wellbeing at the growing fintech Delio 

The results we got from implementing CAST AI were brilliant right from the start. It’s all about the fact that we don’t need to worry about scaling our clusters anymore. Peace of mind is definitely one of the biggest benefits of the platform. Especially if you go multi cloud as the differences between managing EKS and AKS clusters are significant. CAST AI takes that complexity away. 

Alex Le Peltier
Head of Technology Operations at Delio

The management overhead from running Kubernetes clusters across AWS and Azure motivated Delio to look for a solution that would automate tasks around resource scaling to ensure high-performance and a great user experience. Reclaiming the time of their engineers was a top priority for Delio.

By implementing CAST AI, Delio enjoys the benefits of a fully automated process that balances cost and performance. The team of engineers overseeing the scalability process reclaimed plenty of time to invest in more impactful tasks than micromanaging Kubernetes clusters.

Get free cloud cost monitoring & more

You will get unlimited access to cloud cost monitoring, reports, and cost reduction insights for Kubernetes

No CC required. For AWS, GCP, and Azure

Other case studies you may like