An automated solution:
- Selects the most cost-efficient instance types and sizes to match the requirements of your applications,
- Autoscales your cloud resources to handle spikes in demand,
- Removes resources that aren’t being used,
- Takes advantage of spot instances and handles potential interruptions gracefully.
- Does so much more to help you avoid expenses in other areas – it automates storage and backups, security and compliance management, and changes to configurations and settings.
- And most importantly – it applies all of these changes in real time, mastering the point-in-time nature of cloud optimization.
Not only does optimization help you achieve all of these things, but it can make the process automatic – without adding repetitive tasks for engineers. Some things just aren’t supposed to be managed manually.
Let’s look into some of the named cost optimization points to see why automation brings so much value there.
1. It makes cloud billing 10x easier
Start with the cloud bill and you’re bound to get lost
Take a look at a bill from a cloud vendor and we guarantee that it will be long, complicated, and hard to understand. Each service has a defined billing metric, so understanding your usage to the point where you can make confident predictions about it is just overwhelming.
Now try billing for multiple teams or clouds
If several teams or departments contribute to one bill, you need to know who is using which resources to make them accountable for these costs. And cost allocation is no small feat, especially for dynamic Kubernetes infrastructures. Now imagine doing it all manually for more than one cloud service!
2. Forecasting is no longer based on guesswork
To estimate your future resource demands, you need to do a few things:
- Start by gaining visibility – analyze your usage reports to learn about patterns in spend,
- Identify peak resource usage scenarios – you can do that using periodic analytics and run reports over your usage data,
- Consider other sources of data like seasonal customer demand patterns. Do they correlate with your peak resource usage? If so, you might have a chance of identifying them in advance,
- Monitor resource usage reports regularly and set up alerts,
- Measure application- or workload-specific costs to develop an application-level cost plan,
- Calculate the total cost of ownership of your cloud infrastructure,
- Analyze the pricing models of cloud providers and accurately plan capacity requirements over time,
- Aggregate all of this data in one location to understand your costs better.
Many of the tasks we listed above aren’t one-off jobs, but activities you need to engage in on a regular basis. Imagine how much time they take when carried out manually.
3. You avoid falling into the reservation trap
Reserving capacity for one or three years in advance at a much cheaper rate seems like an interesting option. Why not buy capacity in advance when you know that you’ll be using the service anyway?
But like anything else in the world of the cloud, this only seems easy.
You already know that forecasting cloud costs is hard. Even companies that have entire teams dedicated to cloud cost optimization miss the mark here.
How are you supposed to plan ahead for capacity when you have no clue how much your teams will require in one or three years? This is the main issue with products like reserved instances and savings plans.
Here are a few things you should know about reserving capacity:
- A reserved instance works by “use it or lose it” – every hour that it sits idle is an hour lost to your team (with any financial benefits you might have secured).
- When you commit to specific resources or levels of consumption, you assume that your needs won’t change throughout the contract’s duration. But even one year of commitment is an eternity in the cloud. And when your requirements go beyond what you reserved, you’ll have to pay the price – just like Pinterest did.
- When confronted with a new issue, your team may be forced to commit to even more resources. Or you’ll find yourself with underutilized capacity that you’ve already paid for. In both scenarios, you’re on the losing end of the game.
- By entering into this type of contract with a cloud service provider, you risk vendor lock-in – i.e. becoming dependent on that provider (and whatever changes they introduce) for the next year or three.
- Selecting optimal resources for reservation is complex (just check out point 3 above in this article).
4. You can rightsize virtual machines in real time
Selecting the right virtual machine size can drive your bill down by a lot if compute is your biggest expense.
But how can you expect a human engineer to do that when AWS alone has some 400 different EC2 instances alone that come in many sizes?
Similar instance types deliver different performance levels depending on which provider you pick. Even in the same cloud, a more expensive instance doesn’t always come with higher performance.
Here’s what you usually need to do when picking an instance manually:
- Establish your minimal requirements
Make sure you do it for all compute dimensions, including CPU (architecture, count, and processor choice), memory, SSD, and network connection.
- Choose the right instance type
You may select from a variety of CPU, memory, storage, and networking configurations that are bundled in instance types that are optimized for a certain capability.
- Define your instance’s size
Remember that the instance should have adequate capacity to handle your workload’s requirements and, if necessary, incorporate features such as bursting.
- Examine various pricing models
On-demand (pay-as-you-go), reserved capacity, spot instances, and dedicated hosts are all available from the three major cloud providers. Each of these alternatives has its own set of benefits and cons.
Considering that you need to do that on a regular basis, that’s a lot of work!
When you automate the above, the way it works might surprise you in a good way.
We were running our application on a mix of AWS On-Demand instances and spot instances. We used CAST AI to analyze our setup and look for the most cost-effective spot instance alternatives. We needed a machine with 8 CPUs and 16 GB.
The platform ran our workload on an instance called INF1, which has a powerful ML-specialized GPU. It’s a supercomputer that is usually quite expensive.
That must have driven the cost up! But it all became clear after we checked the current pricing. It turned out that at that time, INF1 just happened to be cheaper than the usual general-purpose compute we used.
9 days out of 10, it’s not an instance type that’s worth checking if you’re looking for cost savings. An engineer wouldn’t do it or select INF1 as an option when using a manual tool that gives you a preset list of instance types.
But when it’s the AI working, it makes sense to check for these types every single time – that’s how you can get your hands on the best price.
5. Automation scales resources instantly
If you’re running an e-commerce application, you need to prepare for sudden traffic spikes (think getting mentioned by a Kardashian on Instagram) yet scale things down when the need is gone.
Manually scaling your cloud capacity is difficult and time-consuming. You must keep track of everything that happens in the system, which may leave you with little time to explore cloud cost reductions.
When demand is low, you run the risk of overpaying. And when demand is high, you’ll offer poor service to your customers.
Here’s what you need to take care of when scaling resources manually:
- Gracefully handle traffic increases and keep costs at bay when the need for resources drops,
- Ensure that changes applied to one workload don’t cause any problems in other workloads or teams,
- Configure and manage resource groups on your own, making sure that they all contain resources suitable for your workloads.
When scaling manually, you’d have to scale up or down your resources for each and every virtual machine across every cloud service you use. This is next to impossible. And you have better things to do anyway.
That’s where autoscaling comes into play.
Autoscaling does all the tasks listed above automatically. All you need to do is define your policies related to horizontal and vertical autoscaling, and the autonomous optimization tool will do the job for you.
6. It handles spot instances for greater cost savings
spot instances are up to 90% cheaper than on-demand instances, so buying idle capacity from cloud providers makes sense.
There’s a catch, though: the provider may reclaim these resources at any time. If you’re an AI-driven SaaS, this is fine while you’re doing some background data crunching that you can delay.
But what if you need the workload to avoid the interruption? You need to make sure your application is ready for that and have a plan in place when your spot instance is interrupted.
Here’s how you can take advantage of spot instances:
- Check to see if your workload is ready for a spot instance
Will you be able to tolerate interruptions? How long will it take to finish the project? Is this a life-or-death situation? These and other questions are useful in determining if a workload is suitable for spot instances.
- Examine your cloud provider offer
Examining less popular instances is a good idea because they’re less likely to be interrupted and can run for longer periods of time. Before deciding on an instance, look at how often it is interrupted.
- Make a bid
Set the maximum price you’re willing to spend for your preferred spot instance. The rule of thumb is to set the maximum price at the level of on-demand pricing.
- Manage spot instances in groups
You’ll be able to request a variety of instance types at the same time, boosting your chances of securing a spot instance.
To make all of the above work manually, you would have to dedicate a lot of time and effort to ongoing configuration, setup, and maintenance tasks. Automation takes all the repetitive tasks off your plate.