Did you know that 90% of applications have 5x more resources than they actually need?* Overprovisioning or cloud cost savings may not be a top-of-mind issue for your deployments yet. But it’s likely to become a growing concern as your cloud bill grows.
And when the CFO and their financial team start asking you about how you allocate cloud expenses, better give them a good answer. You need to have a solid response to infrastructure utilization and cloud cost efficiency.
Compute resources are the biggest line items on your cloud bill. Selecting the right VM instance types is a key aspect of cloud resource optimization and in some cases can save you 50% of your overall compute bill.
If you’ve optimized your VM choices already, check our blog for more cost optimization tips. Perhaps your cloud bill needs some optimizing?
Table of contents:
- Define your minimum requirements
- Choose an instance type with cloud cost savings in mind
- Consider the pros and cons of different pricing models
- Take advantage of CPU bursting
- Check your storage transfer limitations
- Count in the network bandwidth
* Christina Delimitrou and Christos Kozyrakis, “Quasar: resource-efficient and QoS-aware cluster management”, https://doi.org/10.1145/2644865.2541941.
It’s easy to overprovision cloud resources
Let’s say that you need a machine with 4 cores for your cluster. You can choose from some 40 different options, even in a single cloud scenario where you work with only one CSP.
How can you compare all these instances? All these machine shapes are just too hard for human minds to understand and analyze.
For example, take a look at the following pricing table from AWS:
… and this is just a portion of the table, for one region, and one OS type.
And we’re talking only about one cloud at this point. Things get exponentially more complicated when you go multi-cloud.
So, let’s say that you pick an instance that costs $0.19 per hour. It seems reasonable on paper, so you modify your Infrastructure as Code (AoC) scripts and deploy.
But when you deploy a compute-heavy application, you soon see that it ends up underutilizing other resources you’re paying for. You’re using only half of the memory allocated per node, underutilizing the network, and not even utilizing the SSDs that may be attached to the instance. This doesn’t help to maximize cloud cost savings at all.
If you took a deeper look into the CSP offering, you might have found another VM shape that could support your application for a much lower cost. Sure, it gives you less memory and network, but that’s absolutely fine by your application. And it costs much less – only $0.10 an hour.
This scenario isn’t made up – these are the results of a cost / performance test that we carried out on a demo e-commerce application. You’ll find the case study further down this article, so bear with us.
How to choose the right VM for the job in 6 steps
1. Define your minimum requirements
Your workload matters a lot when you’re choosing a VM instance type with cloud cost savings. You should be making a deliberate effort to order only what you need across all compute dimensions including CPU (and type x86 vs ARM), Memory, SSD, and network connectivity.
On the flip side, while an affordable instance might look tempting, you might suffer performance issues when you start running memory-intensive applications.
Identify the minimum size requirements of your application and make sure that the instance type you select can meet them across all dimensions:
- CPU Count
- CPU Architecture
- SSD Storage
Once you have a targeted set of instance types that fit your application needs, buying a “good enough” instance type might not be the best possible choice. For example, different VMs across different clouds have varying price vs. performance ratios. This means that it’s possible to get better performance for the cloud compute dollar, and save more on cloud costs overall.
Next, you select between CPU and GPU dense instances. Consider this scenario:
If you’re building a machine learning application, you’re probably looking for GPU dense instance types. They train models much faster than CPUs. Interestingly, the GPU wasn’t initially designed for machine learning – it was designed to display graphics.
In 2007, Nvidia came up with CUDA to help developers with machine learning training for deep learning models. Today, CUDA is widely adopted across most popular machine learning frameworks. So, training your ML models will be much faster with GPUs. To find out exactly why that is the case, check out this interesting post that dives into the details.
What about running predictions through your trained models? Can we achieve better price/performance with specialized instance types? CSPs are now introducing new instance types designed for inference – for example, AWS EC2 Inf. According to AWS, EC2 Inf1 instances deliver up to 30% higher throughput and up to 45% lower cost per inference than Amazon EC2 G4 instances.
Side note: We haven’t tested this claim at CAST AI yet, as this instance type is fairly new.
Unfortunately, the selection matrix gets more complicated.
2. Choose an instance type with cloud cost savings in mind
CSPs provide a wide range of instance types optimized to match different use cases. They offer various combinations of CPU, memory, storage, and networking capacity. Each type includes one or more instance sizes, so you can scale your resources to fit your workload’s requirements.
CSPs roll out different computers and the chips in those computers come with different performance characteristics.
You might be getting an older-generation processor that’s slightly slower – or a new-generation processor that’s slightly faster. You might choose an instance type that has strong performance characteristics that you don’t actually need, without even knowing it.
Reasoning about this on your own is hard. The only way to verify this is through benchmarking – dropping the same workload on each machine type and checking its performance characteristics.
This is one of the first things that we did at CAST AI when we started over a year ago. Here are two examples.
Example 1: Unpredictable CPUs within one provider
This chart shows the CPU operations in AWS (Amazon t2-2x large: 8 virtual cores) at different times after several idle CPU periods.
Example 2: Cloud endurance
To understand VM performance better, we created a metric “Endurance Coefficient” and here’s how we calculate it:
- We measure how much work VM type can do in 12 hours and how variable CPU performance is.
- For a sustained base load, you’d want as much stability. For a bursty workload (website that experiences traffic once in a while or an occasional batch job), lower stability is fine.
- You should make an informed decision since sometimes it’s not clear how much stability you’re getting for your money with shared-core, hyperthreaded, overcommitted, between generations or burstable credit-based VM types.
- In our calculation, instances with stable performance edge close to 100, and ones with random performance are closer to 0 value.
In this example, the DigitalOcean s1_1 machine achieved the endurance coefficient of 0.97107 (97%), while AWS t3_medium_st got only a weirdly shaped 0.43152 (43%) – not that it’s a burstable instance.
Let’s get back to choosing the instance type
AWS, Azure, and Google all have these four instance types:
- General purpose – This instance type has a balanced ratio of CPU to memory, a good fit for general-purpose applications that use these resources in equal proportions such as web servers with low to medium traffic and small to medium databases.
- Compute optimized – Optimized for CPU-intensive workloads, this VM type comes with a high ratio of CPU to memory. It’s a good fit for web servers experiencing medium traffic, batch preprocessing, network appliances, and application servers.
- Memory optimized – This type has a high memory-to-CPU ratio. That’s why it’s a good fit for production workloads like database servers, relational database services, analytics, and larger in-memory caches.
- Storage optimized – Workloads that require heavy read/write operations and low latency are a good fit for this instance type. Thanks to their high disk throughput and IO, such instances work well with Big Data, SQL and NoSQL databases, data warehousing, and large transactional databases.
Each provider also offers instance types for GPU workloads under different names:
- Accelerated computing – These instances use hardware accelerators (co-processors) to carry out functions like data pattern matching, graphics processing, and floating point number calculations more efficiently than with CPUs. Great for machine learning (ML) and high-performance computing (HPC).
- Inference type – for example, AWS EC2 Inf that promises up to 30% higher throughput and 45% lower cost per inference than AWS EC2 G4 instances.
- GPU – Machines recommended for most deep learning purposes.
- GPU – Powered by Nvidia GPUs, these instances were designed to handle heavy graphic rendering, video editing, compute-intensive workloads, and model training and inferencing (ND) in deep learning.
- High-performance compute – This type offers the fastest and most powerful CPU VMs with underlying hardware optimized for compute- and network-intensive workloads (which includes high-performance computing cluster applications). A great combination with high-throughput network interfaces (RDMA).
- Accelerator-optimized – Based on the NVIDIA Ampere A100 Tensor Core GPU, these VMs get you up to 16 GPUs in a single VM. Great for demanding workloads like HPC and CUDA-enabled machine learning (ML) training and inference.
Check these links for more info:
A note about ARM-powered VMs
Recently, there has been a lot of attention around Apple’s M1 series, a new ARM-based System-on-Chip (SOC) the company announced in November 2020. According to Apple, the M1 will increase CPU performance by 3.5x and GPU performance by 6x. It’s incredibly fast as it’s based on ARM architecture. Cloud providers already offer ARM-powered VMs like the AWS EC2 A1 family powered by their Graviton2 ARM processor.
Here’s what you need to know about ARM: it’s less power-hungry, so cheaper to run and cool – CSPs charge less for this type of processor. But if you’re toying with the idea of using it, you might need to re-architect your delivery pipeline to compile your application for ARM. If you’re running an interpreted stack like Python, Ruby or NodeJS, your apps will likely just run.
3. Consider the pros and cons of different pricing models
Your next step is picking the right pricing model for your needs. Cloud providers offer the following models:
- On-demand — In this model, you pay for the resources that you use (for example, AWS charges compute capacity by the hour). There are no long-term binding contracts or upfront payments. You can increase or decrease your usage just-in-time. This flexibility makes on-demand instances great for workloads with fluctuating traffic spikes.
- Reserved Instances (Committed use discounts in GCP) — Reserved Instances allow purchasing capacity upfront in a given availability zone for a much lower price than on-demand instances. In most cases, you commit to a specific instance or family, and can’t change it if your requirements change later. Generally, the larger the upfront payment, the larger the discount. Commitments range from 1 to 3 years.
- Savings Plans (AWS) — In this model, the same discounts as Reserved instances for committing to use a specific amount of compute power (measured in dollars per hour) over the period of one or three years. After committing to a specific amount of usage per hour, all usage up to it is covered by the Savings Plan, and anything extra is billed at the On-demand rate. Unlike Reserved Instances, you don’t commit to specific instance types and configurations (like OS or tenancy), but to consistent usage.
But didn’t you go to the public cloud to avoid CAPEX in the first place? By choosing reserved instances or the AWS savings plan, you’re running the risk of locking yourself in with the cloud vendor and securing resources that might not make sense for you in a year or two. In cloud terms, 3 years is an eternity.
This is an evil practice on the part of CSPs. They lock customers in for a discount and cut them off from alternatives for years to come.
- Spot instances (Preemptible in GCP) — In this pricing model, you bid on spare computing resources. This can bring cloud cost savings up to 90% off the on-demand pricing. The problem with spot instances is that they don’t guarantee availability, which varies depending on current market demand in a given region. Spot instances might make sense if your workload can handle interruptions (for example, if you’ve got stateless application components such as microservices).
- Dedicated host (sole tenant nodes in GCP) — A Dedicated host is a physical server that has an instance capacity fully dedicated to you. It allows using your own licenses to slash costs but still benefit from the resiliency and flexibility of a cloud vendor. It’s also a good fit for companies that need to meet compliance requirements and not to share the same hardware with other tenants. Regulated markets that deal with lots of sensitive data often choose dedicated hosts for the perceived security benefits.
Note: Don’t forget about the extra charges. AWS, Azure, and GCP all charge for things like egress traffic, load balancing, block storage, IP addresses, and premium support among other line items. Take them into account when comparing instance pricing and building your cloud budget.
And each item deserves your attention.
Take egress traffic as an example. In a mono-cloud scenario, you’ll have to pay egress costs between different availability zones, which is in most cases $0.01/GB. In a multi-cloud setup, you pay a slightly higher rate, like $0.02 when using direct fiber (for the US/EU).
4. Take advantage of CPU bursting
Take a closer look at each CSP, and you’re bound to see “burstable performance instances.”
These are instances designed to offer teams a baseline level of CPU performance, with the option to burst to a higher level when your workload needs it. They’re a good match for low-latency interactive applications, microservices, small and medium databases, and product prototypes, among others.
Note: The amount of accumulated CPU credits depends on the instance type – larger instances collect more credits per hour. However, there’s also a cutoff to the number of credits you can collect, and larger instance sizes come with a higher cutoff.
Where can you get burstable performance instances?
- Instance families: T2, T3, T3a, and T4g.
- Restarting an instance in T2 family = losing all the accrued credits.
- Restarting an instance in T3 and T4 = credits persevere for seven days and then are lost.
- Learn more here.
- B series VMs of CPU bursting.
- When you redeploy a VM and it moves to another node, you lose the credits.
- If you stop/start a VM while keeping it on the same node, it retains the accumulated credits.
- Learn more here.
- Shared-core VMs offer bursting capabilities: e2-micro, e2-small, e2-medium.
- CPU bursts are charged by the on-demand price for f1-micro, g1-small, and e2 shared-core machine types.
- Learn more here.
Our research into AWS showed that if you load your instance for 4 hours or more per day, on average, you’re better off with a non-burstable one. But if you run an online store that gets a stream of visitors once in a while, it’s a good fit.
Note: CPU capacity has its limits
We discovered that compute capacity tends to increase linearly during the first four hours. But after that, it becomes much more limited. The amount of available compute reduces almost by 90% until the end of the day.
5. Check your storage transfer limitations
Here’s another thing to consider when maximizing your cloud cost savings: data storage.
- AWS EC2 instances use Elastic Block Store (EBS) to store disk volumes,
- Azure VMs use data disks,
- And GCP has the Google Persistent Disk for block storage.
- You get local ephemeral storage in AWS, Azure, and GCP too.
Every application has unique storage needs. When choosing a VM, make sure that it comes with a storage throughput your application requires.
Also, don’t opt for pricy drive options like premium SSD – unless you expect to employ them to the fullest.
6. Count in the network bandwidth
If you have a massive migration of data or a high volume of traffic, pay attention to the size of the network connection between your instance and the consumers assigned to it. You can find some instances that can bolster 10 or 20 Gbps of transfer speed.
But here’s the caveat: only those instances will support this level of network bandwidth.
Saving on cloud costs – case study
We recently tested the CAST AI approach to cloud cost savings on an open-source e-commerce demo app adapted from Google.
Here’s what we did:
- To get some meaningful costs, we load-tested our application with a high number of concurrent users (~1k).
- We then scaled the pods for each microservice accordingly. This was done on AWS EKS with a statically-scaled deployment.
- We ran the test for a 30-60 minute period to capture the metrics.
- Next, we extrapolated costs over a 30-day period, taking into account assumed traffic seasonality. We generated a likely 30-day usage pattern as part of our load testing scripts.
- The resulting usage experienced spikes every day at around the same time and during several days of the week traffic was somewhat heavier.
This is how we calculated the monthly costs of running our app on the AWS test cluster and compared our solution to see how it helps to increase cloud cost savings.
Note that our demo scenario assumes a fixed number of servers in the original EKS cluster and an instance type of m5.xlarge. The original cluster doesn’t use auto scaling or spot instances.
Originally, we selected the m5.xlarge VM that comes with 4 CPUs, 16 GB of memory and a high throughput Network Interface (~10 GiB). The cost of this instance type on-demand is $0.192/hour.
Then we launched CAST AI and let the magic happen.
Through automated analysis, our optimizer selected an alternative shape: a1.xlarge. This instance type has 4 CPUs and only 8 GB of RAM. The pods deployed per VM could easily fit into 8 GB. The additional 8 GB on our former instance (m5.xlarge) was a pure waste of RAM. We were able to select the a1 (ARM) processor by re-compiling our apps and building new container images.
Our new instance had a reduced network throughput. This wasn’t an issue for our app because we were not network bound. That is, the traffic generated with the available compute resources didn’t max out the bandwidth capacity of the selected instance.
And now comes the best part:
The a1.xlarge instance costs only $0.102/hour. This means that we instantly achieved cloud cost savings of 46% per compute-hour.
Compute is what you go to cloud providers for. It’s the biggest part of your cloud bill. So if you manage to optimize costs there, you’re on the right track to dramatically reducing your cloud bill.
To help you make the right choice every time, we prepared a checklist with all the steps for choosing the right VM and checking whether you could save up on your setup. You can download it here: