How Bede Gaming Optimizes Kubernetes Workloads with Zero Performance Risk

Company

Bede Gaming (part of the Merkur Group) provides a leading digital platform for the online gaming industry, powering some of the market’s most well-known companies with a scalable and secure solution. The platform manages data for nearly 6 million players worldwide, processing over 8 billion transactions worth over £22 billion annually.

Challenge

Facing massive user traffic peaks regularly, development teams overprovisioned Kubernetes workloads to deliver a fantastic experience to end users. But that was costly, which prompted Bede Gaming to search for a solution to optimize resource utilization at the workload level without impacting performance or adding extra work to the platform team.

Solution

Bede Gaming turned to Cast AI’s Workload Optimization with its automated rightsizing capabilities. The solution automatically sets workload requests and limits to increase resource utilization, eliminate cloud waste, and balance cost and performance.

Results

Automatically scaling workload-level resources up or down to boost utilization
Optimal performance and cost-effectiveness
10-15% of cloud cost savings with no impact on availability

Bede Gaming achieved 10-15% cost savings on its cloud infrastructure bill using the most conservative optimization thresholds of Cast. The automation solution removed all the time and effort the platform team was investing in managing cloud resources and costs manually, allowing teams to focus on higher-value initiatives.

Automation is important because in a perfect world – and I think any engineer would agree – you want to automate everything as much as possible. You want to remove the level of risk around human error. And ideally, you want something that’s going to run at a minimal cost.
Having the ability to automate what we do with rules and thresholding, I’m certain that the machine will work way faster than we possibly could with any human being, 24/7.
It’s all about striking that balance between cost and performance and, ultimately, becoming more efficient for our customers. Cast helps us achieve that.
Dan Whiteley
Chief Technology Officer at Bede Gaming

Scalability and the risk of overprovisioning

What is the most prominent business use case your cloud infrastructure supports?

We provide backend platform services for iGaming companies, powering some of the sector’s biggest brands across lottery, casino, sportsbetting and bingo. With those types of environments, you’re running a lot of high-availability, high-performance infrastructure that needs to be able to respond to very high traffic levels.

Imagine a scenario where you’ve got a major event coming up, like the Super Bowl. People want to make a sports bet, and suddenly, hundreds of thousands of users will potentially visit your site. Online casinos are a little more consistent in traffic, but you still get users who make a lot of transactions over a period of time.

The infrastructure needs to respond to that demand. So, we get the scale and elasticity of the infrastructure, but without good cost control, it can get very expensive very quickly.

E-commerce companies experience this as well during Black Friday. Maybe some of their teams are provisioning more cloud resources than necessary just to have a quieter life, but it comes at a very big cost to the business.

Balancing cost and performance

What is your approach to cloud cost management?

When I joined the company, one of the things I became aware of was the cost of running our platform. It’s a large and detailed system with lots of integrations, tools and features, and this was adding some high operational run costs.

We’ve taken the first step to move over to Kubernetes, but how do we ensure that we optimize that infrastructure?

My objective was twofold:

Gaining better control and visibility over cost
Bringing the cost down to make our platforms more cost-effective

And from the end-player perspective, all of this should make no difference. They should have a great, responsive experience. They shouldn’t care how the clock works, so to speak.

We’ve got a really powerful infrastructure, which we were scaling down manually after peak traffic periods ended, but a human being can only go so far versus a machine. It’s important to strike the right balance between maintaining a high quality of service while keeping our expenses down.
As it turns out, some algorithms can tune that and point out potential savings for us. We want to always drive efficiencies by leveraging technologies like Cast.

What was your first step on the journey to optimizing your cloud costs?

When I first asked how much our cloud bill was, I instantly felt it was expensive based on my experience with the type and size of the organization. And it turned out I was right.

Our procedure was to overprovision workload during peak periods, ensuring good customer service by scaling resources beyond what was actually needed to ensure quality coverage for exceptional / unexpected spikes.

I’m refreshing these procedures with a mindset adjustment to FinOps. The idea is that you need to run it lean without breaking the service.

That’s where Cast was a really good eye-opener. I’m not relying on a human being just to turn a virtual dial in Azure and we don’t have to risk any human error with the resource calculations causing impact to the customer experience. Instead, algorithms are making the best choice for me. It gives us visibility and the ability to rightsize our infrastructure at the workload level.

Automated rightsizing was the solution

How can a solution like Cast help companies that face peak times so often?

When dealing with high volumes of traffic, we need to have faith in the infrastructure and any technology we put behind it. We have to have faith that’s not going to disrupt the service.

The experience with Cast was quite interesting because we could put it in a read-only mode and safely see how it could reduce our costs. Then, there are the levels of how aggressive we want to be in terms of optimizing workloads.

For us, being less aggressive works well – we can still generate benefits by applying optimization without risking service stability. For example, I’m personally not comfortable putting our entire production environment on Spot VMs, but Cast’s features give us a lot of room to optimize – for example, picking the right virtual machine sizes or setting workload requests and limits that match actual resource requirements.

Overprovisioning is the easy way out. Anyone can do that. I’ll chuck some massive infrastructure at the problem, and the problem goes away. Yes, but it comes at a significant cost. So, it’s all about striking that balance between cost and performance, and, ultimately, becoming more efficient for our customers. Cast helps us achieve that.

Why can only automation solve this problem?

Automation is important because in a perfect world – and I think any engineer would agree – you want to automate everything as much as possible. You want to remove the level of risk around human error. And ideally, you want something that’s going to run at a minimal cost. Having the ability to automate what we do with rules and thresholding, I’m certain that the machine will work way faster than we possibly could with any human being, 24/7.

We have multiple environments for customers with full isolation between services, which is crucial for maintaining optimal stability. However, there’s considerable overhead in terms of environment management.

So, if we can remove the people aspect from that equation and the potential mistakes they might make, that’s great. This is where Cast, with its level of automation, allows us to just point at these particular clusters and set the thresholds up.

What level of savings were you able to achieve using the least aggressive settings from Cast?

We achieved cost savings in the 10% to 15% range, but we could easily go higher if we changed our thresholds. I’m quite happy with sort of those orders of magnitude since 10% on a big number is still a big number.

My approach is to roll Cast out to our customers’ environments step by step to build our initial baseline and then, over time, start to fine-tune it.

Cast has been what I would call one of those low-hanging fruits to get our workloads under control, get the cost visibility, and drive efficiencies that are ultimately low-risk and low-cost for us to implement.

Anyone who’s got a fairly expensive cloud bill should consider using Cast.

If you’re running Kubernetes, you need to go beyond just running it from a scaling perspective because you’re probably not rightsizing the workloads and VMs. Any organization that gets high volumes of traffic across gaming, financial services, and e-commerce could use automation and avoid overprovisioning a lot of hardware.

What was the impact of Cast on the engineering workload?

We probably don’t measure that per se, but in my mind, it’s one less thing to worry about, so teams can be focused on other things of potentially higher value.

Having Cast running in the background with a good level of confidence that we’re running as efficiently as we can, balancing the service we’re providing – that’s great. Then, periodically, someone can check on Cast to verify that it’s doing what it’s supposed to – which is something we see anyway in our cost usage profile and monthly billing.

A solid partnership

What was the implementation process of Cast like?

I’m quite instinctive as a CTO. If I look at a product and think, “This looks like it can do what we need to do,” I won’t spend days or weeks looking at other options. This worked with Cast since we ran a great POC; that was my success measure.

The integration took 8 weeks to set up, with Cast assisting us through the process. This product delivers on what it says it can do. It was a no-brainer. Additionally, it was relatively low-risk from an implementation point of view for us and relatively low-cost.

The final rollout has been gradual. My hope is that all of our customers will opt to enable this integration for their environments, as it certainly provides benefits to the service we offer them.

How does the Cast team support you during the process?

It’s quite interesting for Cast to work with us because we bring the team on calls with our customers. We bought the tool; we’re good at it, but we’re not experts. So, it’s a kind of joint partnership as the Cast team runs mini POCs for our customers.

It’s all about bringing value back to our customers and making them run as efficiently as possible with the least amount of change and risk involved. The Cast team understands this goal and supports us in achieving it.

One of your architects is a genius. He knows both Kubernetes and Cast inside and out, which is why I’ve got a lot of confidence in this approach. The POC is a great tool to demonstrate the product’s value. When you’re pairing that with people who are competent in what they do and how they understand the platform and your business, that’s very good.

What other Cast features are you planning to use?

We have just started exploring the Kubernetes security aspect of Cast, and our principal DevOps engineer thought vulnerability scanning and other features were useful security features. This was a kind of side benefit to getting the tool, but an important one.

Our primary goal was to achieve better cost control, but we were pleased to discover that there’s additional value we can take advantage of.

Results

Automation is important because in a perfect world – and I think any engineer would agree – you want to automate everything as much as possible. You want to remove the level of risk around human error. And ideally, you want something that’s going to run at a minimal cost.

Having the ability to automate what we do with rules and thresholding, I’m certain that the machine will work way faster than we possibly could with any human being, 24/7.

It’s all about striking that balance between cost and performance and, ultimately, becoming more efficient for our customers. Cast helps us achieve that.

We’ve got a really powerful infrastructure, which we were scaling down manually after peak traffic periods ended, but a human being can only go so far versus a machine. It’s important to strike the right balance between maintaining a high quality of service while keeping our expenses down.

As it turns out, some algorithms can tune that and point out potential savings for us. We want to always drive efficiencies by leveraging technologies like Cast.

Anyone who’s got a fairly expensive cloud bill should consider using Cast.

It’s all about bringing value back to our customers and making them run as efficiently as possible with the least amount of change and risk involved. The Cast team understands this goal and supports us in achieving it.

Cut your cloud costs in half

Solutions

Resources

Company

Book a demo

Download the PDF

Bede Gaming automatically optimizes K8s workloads with no risk to performance

Company

Challenge

Solution

Results

Automation is important because in a perfect world – and I think any engineer would agree – you want to automate everything as much as possible. You want to remove the level of risk around human error. And ideally, you want something that’s going to run at a minimal cost.

Having the ability to automate what we do with rules and thresholding, I’m certain that the machine will work way faster than we possibly could with any human being, 24/7.

It’s all about striking that balance between cost and performance and, ultimately, becoming more efficient for our customers. Cast helps us achieve that.

Scalability and the risk of overprovisioning

What is the most prominent business use case your cloud infrastructure supports?

Balancing cost and performance

What is your approach to cloud cost management?

We’ve got a really powerful infrastructure, which we were scaling down manually after peak traffic periods ended, but a human being can only go so far versus a machine. It’s important to strike the right balance between maintaining a high quality of service while keeping our expenses down.

As it turns out, some algorithms can tune that and point out potential savings for us. We want to always drive efficiencies by leveraging technologies like Cast.

What was your first step on the journey to optimizing your cloud costs?

Automated rightsizing was the solution

How can a solution like Cast help companies that face peak times so often?

Why can only automation solve this problem?

What level of savings were you able to achieve using the least aggressive settings from Cast?

Anyone who’s got a fairly expensive cloud bill should consider using Cast.

What was the impact of Cast on the engineering workload?

A solid partnership

What was the implementation process of Cast like?

How does the Cast team support you during the process?

It’s all about bringing value back to our customers and making them run as efficiently as possible with the least amount of change and risk involved. The Cast team understands this goal and supports us in achieving it.

What other Cast features are you planning to use?

Cut your cloud costs in half

More customer stories

Boost Kubernetes performance, security, and cost optimization

Book a demo

Download the PDF