The Kubernetes node provisioning landscape shifted when Karpenter moved to v1 stable. AWS built it, documented it in their EKS best practices guides, and ships v1.10.0 as the de facto standard for dynamic node provisioning on their platform. Azure adopted it wholesale: AKS Node Auto Provisioning is powered by karpenter-provider-azure and reached general availability in early 2026. Between these two, Karpenter now covers the majority of managed Kubernetes deployments in production.
Google hasn’t moved. GKE still runs on the Cluster Autoscaler backed by Managed Instance Groups, with Node Auto-Provisioning and Compute Classes layered on top. The kubernetes-sigs organization does not have a karpenter-provider-gcp repository as of March 2026. We’ve been watching this gap closely because the explanation reveals something intentional about how Google approaches its managed services and because GKE operators are the ones absorbing the operational cost.
What Karpenter Actually Solved
If you haven’t operated a cluster with Cluster AutoScaler (CAS) and genuinely heterogeneous workload requirements, the appeal of Karpenter might not be immediately obvious. The problem becomes clear when you start managing clusters with meaningfully different workload shapes, because CAS was not built for that.
Traditional Cluster Autoscaler scales within node groups. You define a node group with a specific instance type, set min and max counts, and attach labels; CAS then adds or removes nodes within those bounds. For a simple cluster with one or two workload types, this is manageable. For clusters running GPU inference workloads, memory-optimized batch jobs, and general compute pods simultaneously, this means maintaining separate node group configurations for each combination. Even moderate heterogeneity leads to ten or fifteen node group configs, each with its own scaling logic and lifecycle. The operational surface area compounds as workload diversity grows.
Karpenter replaced that model with a single NodePool CRD that defines constraints rather than a fixed configuration. You specify the instance families you’ll accept, the availability zones, whether spot or on-demand capacity is acceptable, and the architecture requirements. Karpenter watches for pending pods, evaluates their resource requests and scheduling constraints, and provisions the cheapest instance type that satisfies them. This is just-in-time provisioning: the node exists because a pod needs it, not because a group was pre-configured. Active consolidation runs continuously in the background, packing workloads onto fewer nodes and terminating underutilized ones. One NodePool replaces much of the YAML.
The Community Trajectory
Karpenter’s adoption path is readable in its numbers, and the numbers don’t look like a project with uncertain momentum. When it graduated to beta in October 2023, it had 4,900+ GitHub stars and 200+ contributors. It’s now v1.10.0 in the kubernetes-sigs repo, with a CNCF donation process underway as part of the Kubernetes Autoscaling SIG. Bi-weekly working group meetings, weekly issue triage, and active #karpenter and #karpenter-dev Slack communities all point to a project with real operational investment.
For GitOps-heavy teams, the operational fit matters as much as the functionality. NodePool and NodeClass are standard Kubernetes CRDs. You manage them the same way you manage every other cluster resource: in Git, with Flux or Argo CD, and with version history. There is no separate control plane or proprietary configuration format. Karpenter configuration slots naturally into how infrastructure teams already work, which is a large part of why adoption compounded rather than plateaued after initial interest. The trajectory is toward a CNCF-governed standard, and two of the three major cloud providers already treated it that way before the donation was completed.
How AWS and Azure Responded
AWS’s position is straightforward: they created Karpenter, and it’s explicitly recommended in their EKS best practices documentation. The adoption path for EKS operators runs directly through Karpenter, with AWS maintaining the EC2NodeClass and supporting the project as its primary cloud provider. This was never going to be a surprise.
Azure’s approach is more revealing. Microsoft took an open-source project created by a competitor, built a managed addon on top of it (karpenter-provider-azure), and shipped AKS Node Auto Provisioning, which Microsoft describes as Karpenter run as a managed addon, as a generally available feature in early 2026. The fact that Microsoft was willing to build on AWS-originated infrastructure tooling says something about where the community landed on Karpenter’s value. Engineers writing NodePool configs for EKS and for AKS now use the same API shape. The NodeClass differs because it maps to cloud-specific resources, but the model is consistent across both platforms.
Google’s Different Bet
GKE’s autoscaling story involves several components, and it has evolved meaningfully over the years. Standard mode clusters use Cluster Autoscaler backed by Compute Engine Managed Instance Groups. Node Auto Provisioning extends CAS to automatically create new node pools when no existing pool matches a pending pod. ComputeClasses are the most recent addition: CRD-based declarative configs that let you specify preferred instance types with ordered fallback priorities, integrated with both Standard and Autopilot clusters. Autopilot sits at the other end of the spectrum, with Google managing the entire node lifecycle, scaling decisions, and node configuration on your behalf.
This stack is coherent, and it does work. ComputeClasses in particular bring a declarative, GitOps-compatible layer to instance selection that’s closer to Karpenter’s model than raw CAS ever was. Node Auto-Provisioning (NAP) handles the complexity of pool creation that previously required manual configuration. Autopilot genuinely simplifies operations for teams that want to step back from node management entirely. The problem isn’t that these tools don’t function. The problem is that the just-in-time single-node optimization model that makes Karpenter efficient isn’t how any of them operate.
The strategic logic is also visible. Adopting Karpenter would make GKE more operationally interchangeable with EKS and AKS. A Karpenter NodePool has the same structure across clouds; the NodeClass differs, but the model is consistent. If GKE offered the same abstraction, the friction of switching between clouds would decrease meaningfully. Google’s proprietary tooling stack keeps GKE operationally distinct. This is consistent with how Google builds managed services across its platform: build a good product, keep it differentiated. They contributed Kubernetes to the community and are a founding member of the CNCF. Regarding autoscaling tooling, they chose a proprietary path even after a community-led standard emerged. That choice appears deliberate.
What GKE Engineers Are Actually Working With
The practical reality for GKE operators is that tooling choices lie on a spectrum between full control and full abstraction, with meaningful trade-offs at each end. Understanding where the limits are matters more than picking the “correct” option, because each configuration makes different failures easy and different capabilities accessible.
CAS with static node groups works reliably but requires maintenance effort proportional to workload heterogeneity. NAP reduces that burden by automatically creating pools, but it operates within the node pool model. The just-in-time single-node selection that makes Karpenter efficient isn’t how NAP works. ComputeClasses add a useful declarative layer and are worth using if you’re on Standard mode, particularly for GPU workloads where instance type fallback matters. But they don’t include consolidation logic or predictive spot management.
Autopilot removes node management overhead entirely and works well for teams that can stay within its constraints. Those constraints are real. Custom sysctls, host network access, certain security contexts, and specific DaemonSet configurations either require workarounds or aren’t supported. Production engineering teams running diverse workloads regularly run into these limits. The gap isn’t that GKE autoscaling is broken. It’s that the intelligent, just-in-time optimization model that Karpenter delivers on other clouds has no native equivalent in GKE’s tooling stack.
Before evaluating external tools, it’s worth quantifying where your cluster actually stands. GKE’s node utilization dashboards, available under Observability in the Cloud Console, show request-to-allocatable ratios for each node pool. The Workload Metrics view breaks down CPU and memory requests versus actual consumption at the workload level. On most production clusters, this surfaces over-provisioning within minutes. A request-to-usage gap above 40% across CPU or memory sustained for 24 hours or more is a reliable signal that the cluster is carrying significant overprovisioning that consolidation can address. If node utilization consistently sits below 50% during peak hours, there is real headroom for consolidation to recover. That number is the starting point for any honest GKE optimization conversation.
How Cast AI Brings Intelligent Autoscaling to GKE
Cast AI removes the manual work of managing node-level resource decisions on GKE: sizing CPU and memory requests to match actual workload consumption, packing workloads onto fewer nodes to reduce waste, and handling Spot preemption before it disrupts running pods. Where GKE’s native tooling requires ongoing operator decisions to stay efficient, Cast AI makes those decisions continuously and autonomously. Node consolidation runs in the background, moving workloads to fewer nodes and reclaiming underutilized capacity without disrupting pods. This is the operational layer that GKE’s native tooling stack doesn’t provide: a control plane that continuously manages cluster state rather than reacting to threshold breaches.
It deploys as an external controller alongside your existing node pool architecture, installing via Helm with the IAM and RBAC permissions needed to read metrics and modify node pools. No node agents or daemonsets are required. Replacing CAS is unnecessary, and workload specs stay untouched. If Cast AI goes offline, your cluster continues to operate on its existing autoscaling setup, leaving no configuration in an inconsistent state. Cast AI works alongside HPA and VPA without conflicts because rightsizing operates on baseline requests rather than scaling triggers, and it honors PodDisruptionBudgets during node drains.
Resource sizing is one of the decision classes Cast AI handles continuously. ML-driven analysis tracks actual CPU and memory consumption and aligns resource requests and limits with observed usage. Most production clusters are significantly overprovisioned because engineers set requests conservatively when workload behavior isn’t fully characterized. At a 50-node scale, that waste is manageable. At 500 nodes, it degrades bin-packing efficiency and adds directly to your compute bill. Cast AI’s rightsizing runs continuously rather than as a one-time audit, tracking workload behavior as it changes.
The practical result shows in the utilization numbers. Bud Financial reached 90%+ resource utilization across its GKE infrastructure after deployment. Project44, a supply chain visibility platform running multi-region GKE clusters with highly variable batch and real-time load patterns, reduced its GKE spend by 50% in the first month. PlayPlay, a video creation SaaS whose workloads carry significant idle capacity between render jobs, automated Spot VM usage across their cluster, and achieved a 40% cost reduction.
Spot VM management includes interruption prediction: Cast AI monitors preemption signals and proactively migrates eligible workloads within GCP’s 30-second preemption window, which is considerably shorter than AWS’s 2-minute notice for EC2 Spot. For workloads that can’t complete migration within that window, Cast AI manages the fallback to on-demand capacity. Container Live Migration extends this to stateful applications backed by CSI-compatible storage, including GCP persistent disks, enabling workload movement between nodes with minimal disruption. For workloads using non-CSI storage backends, a brief interruption during node drain should be expected. Cast AI also supports cross-region GPU instance selection for GKE, useful for teams whose inference workloads need A100 or L4 availability across multiple regions when their primary zone is out of capacity.
Cast AI is available through the Google Cloud Marketplace. If the utilization data from your node dashboards shows the over-provisioning pattern common across GKE production clusters, the next concrete step is a workload-specific analysis: connect your cluster and get a breakdown of your potential GKE cost optimization here cast.ai/environments/gcp. If you want to walk through the optimization model against your current node pool configuration before committing, the technical demo is the right entry point.



