Caudalie automated Kubernetes and saved 40% on EC2 costs in an already optimized setup

Company

Caudalie is a French skincare company founded by Mathilde and Bertrand Thomas, with roots in the vineyards of Bordeaux at Château Smith Haut Lafitte. The brand’s story began in 1993 when the founders met Professor Joseph Vercauteren, who revealed that grape seeds contain powerful antioxidants. This discovery led to the launch of their first products in 1995, featuring grape seed polyphenols.

Over the years, Caudalie has pioneered several innovations, including patenting resveratrol for its anti-aging properties. Today, Caudalie operates over 50 boutique spas worldwide and has been a member of 1% for the Planet since 2012, dedicating a portion of global sales to reforestation projects.

Challenge

Caudalie’s team had already implemented extensive cost and performance optimizations, but increasing operational overhead led them to seek an automation solution to streamline cluster management and uncover additional efficiency gains.

They also needed autoscaling that could handle traffic fluctuations and reduce unnecessary multi-zone data transfer costs in the test environment. Kubernetes upgrades posed another recurring challenge, requiring manual recreation of node groups across four clusters and migrating workloads – a time-consuming process with a high potential for error.

Solution

Cast AI’s automation features addressed Caudalie’s key operational challenges. By handling node provisioning directly, Cast removed the need for EKS node groups and significantly reduced the manual effort required during Kubernetes upgrades. Autoscaling was consolidated under Cast, enabling the team to maintain a steady baseline and scale capacity up or down based on real-time demand.

Cast also provided more control over cluster configuration. In the test environment, workloads were limited to a single availability zone to minimize inter-zone data transfer costs. The platform’s broad Spot Instance support and compatibility with ARM-based infrastructure further strengthened Caudalie’s existing optimization efforts.

Overall, Cast provided a single system to automate node lifecycle management, streamline autoscaling, and allow additional cost-saving configurations that were difficult to manage manually.

Results

  • 40% EC2 cost savings despite operating in a highly optimized environment
  • Reduced manual toil and operational risk

Cluster autoscaling

Cast’s autoscaler provisions capacity in line with demand to eliminate cloud waste. The image below presents the autoscaler’s impact around the eighth day of the visualized month, where a significant drop can be noted:

Cluster rebalancing

Rebalancing allows Caudalie’s cluster to reach its most optimal and up-to-date state. During this process, Cast automatically replaces suboptimal nodes with new ones that are more cost-efficient and run the most up-to-date configuration settings.

Cast is a strong solution for companies that have already invested in infrastructure optimization but want to reduce their costs even further, especially if they lack the time or in-depth knowledge required to do it on their own. 

Using Spot Instances, autoscaling, running a single zone in a test environment, or even shutting down the test environment at night when no one is working – these optimizations can make a big difference. Cast is a great match for companies looking for these capabilities.

Jonathan Ribas, Chief Technical Officer at Caudalie

Finding new avenues for cluster optimization

How did you find Cast?

It was during the AWS Summit in Paris that I saw the Cast stand. I spoke with a Cast team member who showed me how the solution works. We had a wonderful first conversation and exchanged cards. Our technology partner, ZenOps, facilitated the integration process with Cast from the outset.

What was the onboarding process for clusters to Cast like?

We started with a relatively long POC. Cast was really flexible on this part, which we greatly appreciated, as it allowed us to test the solution properly. To be honest, not every vendor offers a long POC like this, so it was good for us to see how everything worked. 

Cast support replied quite quickly whenever we encountered issues, and on one occasion, when I needed deeper support, they joined a call with me to pinpoint the issue, and we found it together. We received lots of help from support.

After that, we decided to go with Cast for our production clusters, and we’ve been working together ever since. 

What level of cost savings did Cast generate for you?

From the first months, we already saw a decrease in EC2 costs by about 20%. We were already heavily optimized from the beginning: we use ARM, we were already on Spot Instances, and we weren’t coming from a situation where Cast had to do all the magic. We had already done a significant amount of work before joining Cast, so the numbers may not be as impressive – but that’s because of where we started.

Our application is not very demanding in terms of resources, as we have optimized it on both the code and resource sides. We’ve done a significant amount of work on performance, and we utilize several caches, which helps keep our infrastructure as small as possible. 

We’re a passionate small team, and we’re experts in our area. We’re always thinking about how to provide the best experience to our customers and visitors. For us, it’s essential to keep costs optimized, which is why we strive to find the best partners to work with. That’s why we continue to use Cast, and we hope to reduce those costs even further by identifying additional areas for improvement.

Why is autoscaling so important for your cloud infrastructure and business operations?

We run an e-commerce business and have several offers available on our website, just like other merchants do. The idea was to optimize our infrastructure costs, which is why we needed to enable smooth autoscaling with our infrastructure.

That’s why we chose Cast — it helps us maintain a pool of servers during normal periods, and when we see a spike, we have autoscaling in place to handle new customers coming from, for example, a newsletter we just sent.

Once traffic goes back to normal, autoscaling adjusts again and brings the server pool back to the level we determined. So yes, this is the main feature we use today with Cast, and it’s very helpful. We’re already seeing tangible results in the time we’ve been working together.

Your team has also faced challenges around upgrading Kubernetes versions. How did Cast help you solve that?

Yeah, exactly. We use EKS on AWS, and when upgrading to a new version of Kubernetes, we had to manually update the node groups. We had three or four node groups per cluster, and we actually have four clusters. So it was a manual process of recreating the node groups with the new version and then moving the workloads from the old node group to the new one. It was really painful to do manually.

Today, with Cast, we no longer have any node groups on EKS. Everything is managed on the Cast side. We just upgraded our cluster on EKS, and Cast automatically adds new workloads on the new AMI version after the Kubernetes upgrade. 

When it comes to cloud costs, data transfer fees often become a significant cost item. Did Cast help you manage this as well?

One additional thing I recently did was attempt to optimize costs in our test environment. Currently, we follow AWS best practices, which recommend having workloads in every zone of a region. 

But on our test cluster, we don’t need that. I wanted to optimize the data transfer cost on EKS, so I configured Cast to run workloads only in a specific zone within the region.

Now we don’t have data transfer between zones A, B, and C – everything is in zone A. And yes, if there’s an issue with zone A, nothing works, but that might happen once a year, and we can live with that in a test environment.

You’re an experienced user of Spot Instances – did Cast make a difference in your setup for Spot?

Yes, our test environment runs 100% on Spot Instances – and 80% of our production. We had already been doing this before Cast, because I knew there were real cost optimizations in that area. So it’s a best practice we’ve been following for a couple of years now. 

But in the end, it worked in a similar way to what we have today with Cast, because we just defined the minimum and maximum number of Spot Instances, and the autoscaling on the EKS side worked perfectly. However, we also had to manually update the Kubernetes autoscaler plugin every time we performed upgrades.

It all works pretty well. There are always Spot Instances available because we utilize multiple instance types, so we’re not tied to a specific instance family. Our Kubernetes clusters are now running 100% on ARM with Graviton. This provides a significant cost reduction and performance increase, thanks to the architecture.

Did Cast’s automation help your team reclaim time and boost productivity?

I save a few hours during each Kubernetes upgrade – maybe two or three hours. It doesn’t happen that often, but I just like it because I don’t enjoy doing things manually, and the automated process removes that burden. It also reduces the chance of human error – creating everything manually was really not ideal.

Cast AICase StudiesCaudalie

501-5000

Retail

France

EKS

Automate and maintain your clusters.


This field is for validation purposes and should be left unchanged.
Download the PDF
By submitting this form, you acknowledge and agree that Cast AI will process your personal information in accordance with the Privacy Policy.
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form
This field is hidden when viewing the form