Company
Banking Circle is a next-generation bank providing payments and banking services via global clearing networks. It offers global cross-border payments, accounts, and liquidity management using modern banking rails and an award-winning single API, ensuring market-leading compliance and security. The company is trusted by some of the world’s largest and most successful financial institutions like Stripe, Paysafe, and PPRO.
Challenge
Banking Circle processed over 300 million B2B transactions in 2024 for around 530 financial services clients. To meet increasing demand, the department for Advanced Analytics and AI needed to scale and secure its Kubernetes infrastructure. This would ensure performance, cloud costs, and AIOps team efforts were operating efficiently.
Solution
Cast AI automated the entire process of scaling applications up and down, eliminating the manual AIOps effort and saving the team a lot of time and energy. Cast’s security features enabled the team to streamline their existing security process and focus on fixing high-risk vulnerabilities with automated base image recommendations, resulting in accelerated remediation.
Results
- 50-80% Kubernetes cost savings with automated optimization
- Massive time savings enabling the AIOps team to focus on high-value tasks
- Automated K8s security thanks to continuous vulnerability scanning, configuration checks, and shift-left recommendations
Dramatic drop in daily cluster cost
Rebalancing is a feature that allows a cluster to reach the most optimal and up-to-date state with a single click. During this process, suboptimal nodes are automatically replaced with new, cost-efficient ones.
This graph shows the reduction of the daily cluster cost after rebalancing the cluster with Cast.
Rebalancing
One example of rebalancing the cluster resulted in 52% of cost savings. Cast replaced 16 nodes and bin-packed the cluster into fewer, more cost-efficient nodes.
Thanks to the nightly scheduled rebalancing, Cast maintained the cluster at an optimal cost without impacting application performance.
When we had to upgrade Kubernetes versions for all our environments manually, it took about two weeks per year of mindless drudgery work. This limited our growth because we hesitated to open new node pools and clusters. The guy who was doing it kept threatening to leave the company because he was so bored.
Now, we only need to do this manually for the production environment, meaning it’s down to one day’s work. And since it’s not so much mindless drudgery as it used to be, it’s a lot less prone to mistakes. That’s saving us two weeks of full-time employee time, on top of all the other much more significant cost savings Cast AI generated for us.
Things just get easier when you’re using Cast AI. If I asked my team, they would say that it’s totally worth it, even without the cost savings.
Anton Sörensen, Team Lead, AIOps at Banking Circle
Balancing cost and performance in a strictly regulated sector
What are the key aspects of running the infrastructure at Banking Circle?
As a payments bank, we have capabilities such as anti-money laundering, counter-terrorism financing, and sanctions screening. All of these are critical and must happen instantly, which means the system must be up and working at all times. For such services, we have 99.9+% uptime criteria.
At the same time, we need to ensure this doesn’t cost infinite amounts of money. If you have enough money, you could basically just pay your way out of any problem. But that’s not a very good way of doing business, especially if you’re looking to be profitable.
How did you approach cloud cost management?
The cost management tools in Azure are kind of bare bones. You can see how much things cost, but you don’t really know exactly what generates costs. You need to know how the Azure Kubernetes architecture works to understand exactly what’s taking all your money.
Figuring out what costs so much is the first challenge. Next comes the challenge of fixing it. We didn’t have all the tools necessary to do that.
Looking for a solution
Scaling cloud resources was a focus area for you in terms of cost reduction. Which tools did you evaluate?
We could have looked into stuff like Karpenter, but I did a cost-benefit analysis on Karpenter. Not knowing anything about Karpenter and not really knowing anything about Cast at the time, I considered the effort as roughly similar. Now, I would say that Cast is probably a lot easier to set up, especially since you helped us.
But even after that, maintaining Karpenter would be time-consuming; it wouldn’t entirely be a full-time employee’s job, but it would add a significant workload to what they do. We are a small team, and we are, in many regards, separate from the rest of the IT organization due to our requirements.
We would have to hire another full-time employee, whether or not they would actually work full-time with Karpenter’s management. This wasn’t an option because our budget didn’t allow that.
Also, all the money we would save with Karpenter would then be spent to cover the salary of that additional employee who would manage it.
This is when you turned to Cast AI?
Exactly. Today, I know that Cast requires close to zero maintenance. Once in a while, we have monthly or bimonthly calls with one of your engineers, during which you guys basically tell us that we’re doing something wrong and how to do it right this way instead.
We don’t have to do research in a forum, going through posts from 2021 and wondering whether they’re actually up to date.
This saves a lot of time, allowing us to focus on feature development and higher-value tasks.
Automating Kubernetes with cost in mind
Which Cast features had the biggest impact on your costs and team productivity?
The three most impactful features of Cast for us were:
Automated scaling up and down
When it comes to our lower environments, we need to replicate production as closely as possible in those environments, and our main costs are related to being able to scale up. That costs a lot of money, and it’s crucial to be able to scale down confidently and trust that you can scale up whenever you need to.
Cast AI helps us here. We can shut everything down at a moment’s notice if we don’t need it. If we’re not doing performance tests or someone else is not putting traffic through our applications, they scale down to the minimum. The nodes then scale instantly because Cast basically just takes care of it. That is a huge bonus.
Fast autoscaler
Another issue we’ve faced was that Azure’s autoscaler for Kubernetes was too slow for us. Cast speeds this process up significantly. If we need a new VM to scale up our cluster in some way, it can just give it to us in seconds – where in some cases, it took literally days for Azure to give us the VM we were asking for. (Note: this was due to a bug in Azure, which has since been fixed.)
Security
Finally, there’s Cast’s security product. When we got security scanning through Cast, we just stopped evaluating other Kubernetes security options.
Does your infrastructure experience traffic spikes? How did the Cast autoscaler perform under such conditions?
The stress about this scaling issue that we had with Azure has actually been resolved now, thanks to Cast. When it was ongoing for two or three months, the team was under huge pressure to ensure that we had what it took to handle spikes.
Since the company is still growing, those spikes have grown over time. And since we didn’t want to spend too much money, we had aimed at being right at the limits; our maximum capacity was right at the limit of those spikes.
This meant that we needed to keep an eye on this all the time because if the bank suddenly onboarded new clients and the flow grew, we would need to manually increase this quickly enough that something didn’t break.
Reaping the benefits of automation: 50-80% cost savings and massive AIOps time savings
How much have you saved on cloud costs since integrating Cast?
It’s hard to say because of the cost calculation tools in Azure, but we’re looking at somewhere between 50% and 80%.
We earned a return on investment before we started paying for the platform. The demo saved us so much money that the Cast charge paid for itself instantly.
How did Cast impact your team’s engineering workload?
Considering the instability issues we’ve experienced, not having to think about scaling up or down saves us so much heartburn. As a team manager, it’s just a huge amount of stress that I don’t experience anymore. With Cast, we could put it all on the side table and forget about it.
Let me give you an example:
Azure’s Kubernetes offering doesn’t have an automatic upgrade path from one version to the next yet.
So you need to:
- Upgrade the main Kubernetes service,
- Take every node pool,
- Create an identical node pool,
- Take all the pods running in the old node pool,
- And finally put them into the new node pool running the new version of Kubernetes.
All of this had to be done manually.
When we had to upgrade Kubernetes versions for all our environments manually, it took about two weeks per year of mindless drudgery work. This limited our growth because we hesitated to open new node pools and clusters. The guy who was doing it kept threatening to leave the company because he was so bored.
Now, we only need to do this manually for the production environment, meaning it’s down to one day’s work. And since it’s not so much mindless drudgery as it used to be, it’s a lot less prone to mistakes. That’s saving us two weeks of full-time employee time, on top of all the other much more significant cost savings Cast generated for us.
With Cast, we only need to update the main Kubernetes stuff. Then, Cast will grab that, and whenever it creates a new node pool, it will use that version, which means it’s literally a click of a button. It’s one of those things that we don’t need to think about anymore, which means that we don’t have to plan for it. It also means that it’s easier for us to stay up-to-date with Kubernetes, which is nice.
All in all, Cast has all of the little things that help us save a couple of minutes here and a couple of minutes there, or even five hours of furious googling followed by tearing out your hair. Things just get easier when you’re using Cast. If I asked my team, they would say that it’s totally worth it, even without the cost savings.
Seamless security for Kubernetes
You mentioned using Cast’s security product – how does it help your team manage Kubernetes security?
Cast’s security dashboard helped us focus our efforts because we knew security was an issue. It helped us find out where to start and showed which things we needed to set up, all in one place. This made dealing with audits and compliance much easier.
Cast AI security dashboard.
I’d like to highlight two particular use cases of Cast’s security features:
1. Scanning Docker images for vulnerabilities
We use Cast’s security features because we’re home-rolling all our Docker images for everything. Since we hadn’t set up security scanning yet as we were building the images, we had no real way of monitoring which applications experienced major security issues, critical security issues, and so on.
The Vulnerabilities report allows users to easily identify the container images currently running in the cluster and the vulnerabilities they may expose. It provides a comprehensive view of any vulnerabilities that may have been introduced, regardless of how the image was added to the cluster.
Cast gave us the information we needed to build a small pipeline that just refreshed their critical step in our Docker build. So, we realized that only our base image was bad. All the other images that we built on top of that were perfectly fine.
Previously, we didn’t know where we needed to focus, so we did everything, which meant putting security scanning into every pipeline that built the Docker image in the entire project.
That was a lot of work, and it was really hard for us to find an entry point where we could start and feel comfortable delivering the correct amount of value for minimum time. Since we got Cast and realized that this is all the base image, we could patch the base image, and then we were done.
The Image hierarchy section displays the base image used to create the selected image, along with the layers and commands used to build each layer. This information is valuable for identifying the layer that introduced vulnerabilities and determining the necessary steps to address them. The Vulnerabilities tab lists all detected vulnerabilities in the selected image, along with available fixes. Vulnerabilities are sorted by their CVSS (Common Vulnerability Scoring System) score, highlighting the most critical issues.
2. Keeping track of application updates
Cast helps us keep track of applications that we might not update often. Cast reminds us that those exist, and they might have a couple of vulnerabilities, which means that we might want to rebuild them. Even though they’re still working perfectly fine, we just rebuild the image with the latest versions of all our base images and stuff like that, deploy it again, and all the vulnerabilities disappear.
A partnership built on support and trust
How did the Cast team support you throughout the integration process and later on?
Our team appreciated working with Cast. Small hiccups were addressed quickly by the Cast support team. They helped us investigate things and suggest solutions even if we didn’t come to a solution.
They’ve taken us seriously and not just brushed us off, which, in my experience, is common, especially when you work with this kind of integration.
Other than that, we’ve actually found a couple of bugs or issues in Cast, and all of those have been fixed within the day, which is also super impressive. That’s not something I expected, but since it did happen, now I get disappointed when it doesn’t happen elsewhere, especially within our own team.
What kind of company stands to benefit from Cast?
If you’re only running a couple of node pools, you might save money, but it still takes some time and some effort. However, Cast is a no-brainer if you have a large Kubernetes presence. We’ve had a really nice experience with Cast so far, and I would absolutely recommend it to any such company.
If it’s a big company and they have a lot of stuff in Kubernetes, then I would expect they already do something like this. But if they don’t, Cast is a must-have because we save more than one full-time employee, and we’re a fairly small team.
However, for a larger company, such cost savings can affect their balance books. Suppose you’re a big tech company, and your Kubernetes costs are a major part of your infrastructure. In that case, your valuation will likely increase because everybody’s happy with you making more profit.
So, if you’re a big company and you spend millions on your Kubernetes, then not having a solution like Cast on board is irresponsible.