Engineering

The Hackathon Fix That Cut Our Storage Costs by 93%
For the second year running, Cast AI hosted an internal Hackathon during our Vilnius team…

Deploying GPU workload with Dynamic Resource Allocation
Kubernetes DRA replaces legacy GPU counts with structured, attribute-based requirements. This post demonstrates how to…

Tier Your Apps, Cut Your Costs: A Practical Framework for Spot Instances in Production
In this guide, we’ll walk through a practical approach to running Spot Instances in production…

Kubernetes Resource Management: Optimizing High-Resource Initialization Workloads
Kubernetes workloads can fail during startup even when resources look sufficient. CPU spikes in Java…

Multi Cloud Kubernetes: Reducing Cost And ComplexityĀ
Running Kubernetes within a single cloud provider is already a pretty tall order. Multi cloud…

Intelligent Spot Instance Availability: How Machine Learning Reduces Interruptions by up to 94%
Discover how Cast identifies Spot Instances with low interruption rates and prioritizes them when scaling…

Demystifying Quantizations: Guide to Quantization Methods for LLMs
Quantization is key to running large language models efficiently, balancing accuracy, memory, and cost. This…

Kubernetes Scheduling Best Practices: Mastering Topology Spread Constraints and Pod Affinity
Effective pod scheduling is key to resilient, cost-efficient Kubernetes infrastructure. This in-depth guide explores pod…

Enterprise Kubernetes Best Practices: Building a Resilient, Secure, and Cost-Optimized Kubernetes Platform
Even experienced cloud-native teams struggle with Kubernetes complexity, security, and resource waste. This guide shares…