How Yotpo Fully Automated Spot Instances And Scored Up To 40% Savings   

→ 30-40% of cloud cost savings 
→ Fully automated Spot instance lifecycle
→ Massive human resources savings on infra management during Black Friday 
Company size

800+ employees

Industry

Technology

Headquarters

Tel Aviv, Israel

Cloud services used

Elastic Kubernetes Service (EKS)

Company

Yotpo is a cloud-based SaaS provider of cutting-edge marketing solutions that help organizations accelerate growth by enabling advocacy and increasing customer lifetime value. The company is valued at $1.4 billion and has raised $436 million in funding.

Challenge

Black Friday is Yotpo’s most critical business time. The company wanted to use Spot instances to reduce costs without impacting performance. However, due to high demand, Spot instances are often unavailable during Black Friday. The scale of manual work involved in moving workloads between instances prompted Yotpo to search for a solution that would automate the entire process. 

Solution

Yotpo integrated CAST AI to manage the entire Spot instance lifecycle, from provisioning the most cost-effective instance type and size to moving workloads back and forth between Spots and on-demand instances when Spot availability changed, along with autoscaling in line with changing demand.

Results

Yotpo realized significant cost savings immediately after implementing CAST AI and automatically migrating workloads to Spot instances. During Black Friday, engineers no longer had to spend time manually adjusting and moving workloads between instances – CAST AI did that automatically, without any impact on availability. Combined with CAST AI’s autoscaling capabilities, the company leveraged a powerful 1-2 punch to optimize its cloud costs.

30-40% of cost savings on Kubernetes workloads

The graph below compares Yotpo’s cloud costs within two time periods. Period A is a 3-day range during which the team onboarded CAST AI. Period B shows the level of savings Yotpo achieved after completing the implementation of CAST AI.

Highly efficient autoscaling

The graph below shows the efficiency of the production workload when running on CAST AI over 30 days. CAST AI’s autoscaler closely follows the changing demands of the workload, increasing and decreasing provisioned CPUs. 

After integrating CAST AI, we didn’t have to do anything during Black Friday, which is amazing. We gained not just compute cost reduction but also a reduction in engineer workload.

No Spot instances are available? No problem; the workload is automatically moved to on-demand instances – and CAST AI makes sure to select the cheapest instance that matches the resource consumption of that workload. And once Spots are available again, the workload is moved back there. The whole spike on Black Friday was much lower, which is cool.

Achi Solomon
Director of DevOps at Yotpo

Spotlight on Yotpo’s FinOps best practices 

How do you manage Yotpo’s infrastructure from a cost perspective? 

I think that almost every DevOps leader or VPR leader faces the challenge of managing hosting costs because they can get out of hand really quickly. Some are taking action but don’t understand the implications of what they’re doing or the cost of a job. 

The cost of revenue is something that is very important for us, especially at this time. It’s key to stay on track and make sure that we’re handling our budget correctly. And it was a challenge.

When we started focusing on that, we initially reviewed the costs monthly. The problem with the monthly reviews is that when you discover something, it’s already too late. You might see a cost increase, and that increase may have happened at the start of the month, but you see it only at the end of the month. 

So, the first thing we’ve done is move into daily allocation. We managed our budgets from a daily perspective. Second, we segregated budgets into groups and teams. Our next step was to go on the offensive and set budget goals per team to drive optimization.

We started to dig deeply into our usage and saw that there were opportunities to save costs. 

Could you give an example of such an opportunity?

Usually, a major win is moving from on-demand to Spot instances and instantly saving money. But when I joined the company, Yotpo was already running 80% of workloads on Spot instances. 

So, we had to find more creative ways to reduce costs even more. 

And then we saw that if we moved the same EC2 instance to another Availability Zone (AZ) within the same region, the cost would drop by 20%. And that’s quite an easy fix.

So we started moving workloads to cheaper instances. Then we also looked at the instance type itself. Is it the most cost-effective instance for our workload? 

We’ve been doing all that work manually and at some point, we realized that we couldn’t keep doing it this way. It just didn’t make sense to spend so much time on this. That’s when we started looking for a tool that would do this job for us automatically.

Achi Solomon
Director of DevOps at Yotpo

Testing open-source vs. third-party automation solutions

What was the first tool you decided to try?

The first choice was Karpenter. We love open source, so we listed all our requirements and looked into what Karpenter could give us. Karpenter was good, but the basic functionalities we needed didn’t exist at that time and were very far away on the roadmap.

We could either join and try to influence the roadmap or just find another solution that helps us do more with less. 

Where did the review of third-party solutions take you? 

Once we realized that Karpenter didn’t match our requirements, we started a POC with Spot by NetApp. But then we learned that there are better alternatives that could help us overcome our challenges. So, we took a step back and examined all these solutions with a deep analysis of our requirements. This is when we decided to try CAST AI.

CAST AI addressed our requirements best compared to all the other solutions on the market.

Achi Solomon
Director of DevOps at Yotpo
Did you have any concerns about managing your infrastructure using automation? 

Of course. We’re basically giving our cluster autoscaler to an external company.

That’s why we decided to test it really deeply in order to make sure that we don’t hurt our availability and are still robust. We started a slow testing process to see what happens when workloads are moved into CAST AI.  

When moving from the basic cluster autoscaler to CAST AI, we asked a lot of questions: What happens if there’s no network connection to CAST AI? Will we be able to scale? Will we fall back to the regular cluster? The CAST AI team was able to answer them all and assure us that we would maintain robustness without any problems along the way.

We also encountered some issues, but they were promptly fixed by the CAST AI team. I think that the responsiveness of the CAST team was very good. 

As we slowly migrated our workloads to CAST AI, I could see that the results were quite amazing right from the start.

What was the process of integrating CAST AI like?

The entire process was very slow because, from our perspective, the system’s robustness and high availability are very important. Moving slowly was a good call, and if you took me back in time and asked me if I’d do it faster, I’d say no.

The ROI was great right from the start. Reducing 40% of our compute costs just by migrating our workloads to CAST AI—that’s huge. And we haven’t even utilized all the CAST features up until now, just the automated instance selection, bin packing, and Spot instance automation. 

Achi Solomon
Director of DevOps at Yotpo

The process took time, between four and five months in total. Moving from the AWS cluster autoscaler to the CAST AI autoscaler, for example, was very gradual. Naturally, some issues cropped up throughout the process, but the CAST AI team was very responsive and solved them quickly. 

Results: 30-40% of cost savings and full Spot instance automation

Instant cost savings

What results did CAST AI bring in terms of your cloud spend?

Just by moving our workloads to CAST AI, we were able to reduce our costs by 30-40%, depending on the workload size and optimization stage. These were significant savings given that we’ve previously optimized these workloads.

Achi Solomon
Director of DevOps at Yotpo

For us, moving the needle on our largest expense in AWS was very impactful, especially from the cost of revenue perspective. 

Get results like Yotpo – book a demo with CAST AI now

Full Spot instance automation during Black Friday

How was the experience during Black Friday from a cost and performance perspective?

During Black Friday, AWS runs out of Spot instances because many companies decide to use them to bump up their capacity for this time. There is a basic functionality in AWS where the platform moves your workloads to on-demand instances if Spot instances aren’t available. The problem is that these workloads stay there, even if some Spot instances become available.

Prior to using CAST AI, we moved a percentage of our workload to on-demand instances. After Black Friday and Cyber Monday, these workloads were moved back to Spot instances. This was all done manually and required a lot of human effort.

After integrating CAST AI, we didn’t have to do anything during Black Friday, which is amazing. We gained not just a reduction in computing costs but also a reduction in engineer workload.

No Spot instances are available? No problem; the workload is automatically moved to on-demand instances – and CAST AI makes sure to select the cheapest instance that matches the resource consumption of that workload. And once Spots are available again, the workload is moved back there. The whole spike on Black Friday was much lower, which is cool. 

Achi Solomon
Director of DevOps at Yotpo

Savings across team workloads

How did CAST AI impact the workload of teams managing the infrastructure?

Yotpo has about 50 to 60 R&D teams responsible for everything from writing and testing code to monitoring it – including the cost.

It would take one developer in each team a week to prepare for Black Friday. Afterward, moving workloads back to Spot would take about the same time. That’s quite a lot of work that CAST AI has eliminated.

A future-proof partnership

How was the support during integration and while running CAST AI?

We can take Black Friday as an extreme example here. Before Black Friday, we contacted CAST AI’s support team, saying that this was an important event for us. We set up a war room to be ready whenever an issue arose and to make sure it was fixed as soon as possible. But nothing like that happened and the war room was quite empty most of the time, which is good. 

What I like about CAST AI is that I don’t need the support. The product itself is quite self-explanatory. 

I think that the product input we’re giving is welcome, and CAST AI is helping us implement what we need. So it’s great from our perspective.  

What features are you looking forward to? 

So these are some of the features that we are utilizing. However, CAST also has the Workload Autoscaler, which can rightsize the deployment. We haven’t tried that yet and are currently testing it to check its potential impact. I’m really excited about my next adventure with CAST AI.

You’re underway to simplify Kubernetes

  • No more complexity of Kubernetes management
  • 50%+ lower cloud costs without repetitive tasks
  • Predictable cloud bills and performance at all times

4.1/5 – Average rating

5/5 – Average rating

Users love CAST AI on G2 CAST AI is a leader in Cloud Cost Management on G2 CAST AI is a leader in Cloud Cost Management on G2 CAST AI is a leader in Small-Business Cloud Cost Management on G2

Book a demo

Find out how much you can save on your cloud bill.

✓ Valid number ✕ Invalid number
Which Kubernetes services are you using?(Required)
This field is for validation purposes and should be left unchanged.