Optimize your first cluster for Free!

AWS Karpenter – Is the AWS Open-Source Kubernetes Autoscaler Right for You?

AWS created a new open-source tool called AWS Karpenter. I will investigate how it works and unavoidably compare it with what we built at CAST AI. What exactly is Karpenter, how does its autoscaling mechanism work, and should you use it?

AWS Karpenter

Karpenter is an AWS-centric, simple autoscaler with two reactive policies. Its biggest selling point? No node pools. 

Compared to Karpenter, CAST AI has a lot more automation built into handling spot interruptions, bin-packing, and rebalancing. Our rebalancing and bin-packing processes are designed to be risk-free and carefully consider stateful workloads. Karpenter is more aggressive in its operation as it doesn’t take downtime into account.

Let’s take a look at how Karpenter works and compare it to the autoscaling mechanisms of CAST AI.

Quick feature comparison – CAST AI vs. AWS Karpenter

FeatureCAST AI 🥇Karpenter
Vendor support
Google Cloud
Microsoft Azure
Autoscaling features
Unscheduled Pod policy
Bin packing pods
Delete Empty Node policy
Optimized pod grouping
Spot instance support
Cost monitoring and optimization
Security and configuration checks
SLAs and customer support

Installation and onboarding

After trying Karpenter out, I’d say that onboarding isn’t trivial. Still, it’s not as hard as the Kubernetes Cluster Autoscaler.  

Karpenter onboarding requires you to have eksctl CLI installed and configured, even if you created an EKS cluster without the help of eksctl. It takes around 6 to 7 steps to get started with Karpenter.

CAST AI doesn’t have an eksctl dependency, and you’re looking at a 2-step process your IT manager could complete on their own in a minute or two.

Autoscaling policies

Karpenter has only two most basic autoscaling policies. At first glance, they are quite similar to two CAST AI policies (Unscheduled Pod Policy and Node Deletion policy). Let’s compare the two.

1. Provisioner 

In its simplest form (without sub features), CAST AI’s Unscheduled Pod Policy works in a similar way as Karpenter’s Provisioner. 

If there are any pods that can’t start because no suitable node has been found by the Kubernetes vanilla scheduler, a new node will be created based on those pending pod constraints like zone, subnet, PodAntiAffinity, capacity preference (spot or on-demand).

Karpenter has these constraints available for NodeSelector, NodeAffinity, TopologySpreadConstraints:

    - key: node.k8s.aws/instance-type #If not included, all instance types are considered
      operator: In
      values: ["m5.large", "m5.2xlarge"]
    - key: "topology.kubernetes.io/zone" #If not included, all zones are considered
      operator: In
      values: ["us-east-1a", "us-east-1b"]
    - key: "kubernetes.io/arch" #If not included, all architectures are considered
      values: ["arm64", "amd64"]
    - key: " karpenter.sh/capacity-type" #If not included, the webhook for the AWS cloud provider will default to on-demand
      operator: In
      values: ["spot", "on-demand"]

Karpenter supports GPU and ARM with the following tolerations: nvidia.com/gpu, amd.com/gpu, and aws.amazon.com/neuron. To add a GPU node, one needs to define a specific list of nodes (and we’re back to the node pool idea).

What about CAST AI?

CAST AI allows you to specify more pod constraints, like get me instances with fast local storage for Spark jobs but doesn’t require you to specify the instance types on your own. In case you haven’t checked lately, there are 400+ available types on AWS, which makes it easy to get back to implicit Node Pools by accident.

You can also specify the spot interruption rate per cluster and override per workload in CAST AI. For example, you may want your casual spot workload in a cluster to be on more cost-efficient spot instances – even if that means these instances could be interrupted once in a while.

But those huge Jupyter notebook pods with a ton of resources? You want them to run spot instances so unpopular that they barely get interrupted.

Back to the Node Pools against one’s will

Running your Kubernetes workloads on burstable T3a instance types is a crime to your business and DevOps. Unfortunately, this is what you will get by default with Karpenter as they are the most affordable instances.

Not excited about burstable instances? You’ll have to list all the acceptable instance types that you want your Kubernetes cluster to have in the Karpenter Provisioner CRD. And with that, you have just specified a cherry-picked node list and are implicitly back to Node Pools.

And what if these cherry-picked node lists are not available during Black Friday or Christmas holidays? You will need to extend the list with even more alternatives.

2. Delete Empty Node policy

As the name implies, Karpenter can and will delete empty nodes. This policy allows you to set the Time To Live for empty nodes (ttlSecondsAfterEmpty with a default of 30). 

CAST AI’s Delete Node policy works in the same fashion. The CAST AI engine will not remove nodes if there are workloads running on them (excluding DaemonSets, of course).

The Kubernetes scheduler, by default, acts fairly and will always distribute workloads for maximum performance and availability. And if you’re running EKS, you can’t change the Kubernetes scheduler profile.

Let’s look at this with an analogy

Imagine a train (Kubernetes cluster) with 10 train cars (nodes); each train car can seat 100 passengers (pods), and the train has the capacity to transport 1000 passengers at peak.

The train moves from the central station fully utilized with 1000 seats taken, and with every stop, more people get off the train than on it. Even a few stops before the end of the train journey, if the train is carrying only 15 passengers out of 1000, someone needs to ensure that there are empty train cars. It’s likely that there are 1 to 2 passengers in a train car enjoying their privacy, and the Delete Empty Node train car can’t find anything to remove.

So how does Karpenter help with over-provisioned resources for off-peak times when the load is lower?

Enter Node Expiry

Karpenter offers a Node Expiry feature. If a node expiry time-to-live value (ttlSecondsUntilExpired) is reached, that node is drained of pods, and the node is deleted.

Great! But wait a minute, what about your workloads that are running on these nodes? Sorry, end of the shift. Nodes are going down.

Alright, but how would you know what the expiration time should be? If you get it too short, your pods are going down, and you then have to wait several minutes for the Karpenter provisioner to add new nodes with refreshed ttlSecondsUntilExpired.

If you set the ttlSecondsUntilExpired too long, you’re wasting money. So what we see here is that Karpenter reacts in time to add new capacity but leaves you hanging by reducing any excess capacity. 

CAST AI, on the other hand, offers Evictor that shrinks the cluster to the minimum number of nodes by bin packing pods (simulating evictions, honoring Pod Disruption Budgets, etc). Once Evictor makes a node empty, it’s deleted automatically.

Cloud vendor support

At the moment, Karpenter only supports AWS. The code has a lot of AWS-specific hardcoded logic. For instance, using EC2 Fleet and Spot Fleet, without a way forward for other clouds. 

On the other hand, Karpenter has empty interfaces created in the code, so the project looks pluggable and open enough if someone were to take the initiative of adding other cloud provider support.

CAST AI supports AWS (EKS, kOps), Google Cloud Platform (GKE), and Microsoft Azure (AKS).

No node pools

The promise of being free of node pools sounds great. But the way it is now, AWS Karpenter doesn’t offer users almost any control over the scheduling decisions when using a “naked” Provisioner. 

Provisioner is Karpenter’s Kubernetes CRD which defines the node creation constraints. For example, if you had workloads that needed a GPU and you wanted the GPU nodes to have a taint (so that only the pods that actually need it are placed on those nodes), you would create a Provisioner like that:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
  name: gpu
  - key: node.kubernetes.io/instance-type
    operator: In
      - p3.2xlarge
      - p3.8xlarge
      - p3.16xlarge
  - key: nvidia.com/gpu
    value: true
    effect: “NoSchedule”

In Karpenter, the engineer still needs to have deep knowledge of the available instance types with all of their nuances. 

To use the SSDs in a K8s cluster, you would need to create a user-data script that makes a single logical disk out of those local SSDs and creates a symbolic link for the kubelet to use. That requires creating an EC2 Launch Template and referencing it in the Provisioner resource. And, most importantly, you need to keep it up to date if you want to achieve cost savings.

Conclusion: In more complicated clusters, you’d have to create many different Provisioners, i.e. node pools: one for GPUs, another one for storage optimized, yet another one for infrequently interrupted Spot instances, etc.

Provisioner conflicts

When multiple Provisioner resources are defined and an unschedulable pod can tolerate/select any of them, the result will be non-deterministic. 

Here’s an example:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
  name: team-a
    team: a
    - key: "node.kubernetes.io/instance-type" 
      operator: In
      values: ["m5.large"]
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
  name: team-b
    team: b
    - key: "node.kubernetes.io/instance-type" 
      operator: In
      values: ["m5.2xlarge"]
apiVersion: v1
kind: Pod
  name: pod
  - image: nginx
    name: pod

The nginx pod would be scheduled on either the m5.large or the m5.2xlarge node (chosen based on the sequence the API-Server returns for the records). This is another dimension that the DevOps engineer would need to think about. 

The Provisioners should clearly define constraints, and all pods should clearly define the selectors to reduce or eliminate randomness. 

In CAST AI, the selectors/tolerations are predefined, and engineers don’t have to come up with them on their own.

Pod grouping (a.k.a. bin packing on provisioning)

Karpenter’s GitHub states that the solution uses bin packing but only in the nodePlacer sense. That is, it groups pods to be placed on nodes to avoid creating a separate node for each pod. 

In CAST AI, we use Permutations and Clique functions. Their purpose is to find out the most optimal pod groupings. 

Karpenter’s algorithm is basically creating a hash of the pod constraints. This means that otherwise compatible pods will end up on different nodes (Karpenter will create multiple nodes). 

Consider this example:

apiVersion: v1
kind: Pod
  name: pod1
    topology.kubernetes.io/zone: "eu-central-1a"
    - image: nginx
      name: pod
apiVersion: v1
kind: Pod
  name: pod1
          - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                  - "eu-central-1a"
                  - "eu-central-1b"
    - image: nginx
      name: pod

Even though the pods could fit into a single node placed in the eu-central-1a zone, Karpenter will create two nodes (that might even end up in the same zone) because the hash of the pod constraints is different.

Cost optimization

When Karpenter bin-packs pods, it produces a list of viable instance types per each pod group. For example, if there’s a pod group that needs 2 CPU and 8 GiB, then a list of all instance types that have at least 2 CPU and 8 GiB is retrieved. The list is sorted by resource weight.

Karpenter sorts those instance types using a weight function where 1 CPU = 1 GiB, favoring nodes with higher CPU and lower memory:

// weightOf uses a euclidean distance function to compare the instance types.
// Units are normalized such that 1cpu = 1gb mem. Additionally, accelerators
// carry an arbitrarily large weight such that they will dominate the priority,
// but if equal, will still fall back to the weight of other dimensions.
func weightOf(instanceType cloudprovider.InstanceType) float64 {
	return euclidean(
		float64(instanceType.Memory().ScaledValue(resource.Giga)), // 1 gb = 1 cpu
		float64(instanceType.NvidiaGPUs().Value())*1000,           // Heavily weigh gpus x 1000
		float64(instanceType.AMDGPUs().Value())*1000,              // Heavily weigh gpus x 1000
		float64(instanceType.AWSNeurons().Value())*1000,           // Heavily weigh neurons x 1000

After sorting, only the first 20 nodes are picked.

This list might not contain the cheapest instance types because of Karpenter’s particular weighing system. There is no inventory database that can tell exactly how much something costs.

After the bin-packing and sorting, the list is passed to the EC2 Fleet API. The mode of choosing the instance type depends on whether it’s an on-demand node or spot instance.

On-demand instances

In this case, Karpenter passes the “lowest cost” allocation method that makes the EC2 Fleet API choose the cheapest instance type out of the 20 received viable instance types. There might be cheaper instance types, but the API works with what it was given.

Spot instances

AWS Karpenter passes the “capacity-optimized prioritized” allocation method that makes the EC2 Fleet API choose the instance types that have the highest availability, prioritizing the instance types at the top of the list. This has nothing to do with cost optimization and could be considered something similar to the CAST AI “Least interrupted” Spot reliability setting.

CAST AI offers an entirely different approach to cost optimization. The platform selects the most cost-efficient instances that match the workload requirements and automatically rightsizes instances to avoid overallocation.

CAST AI also comes with features such as cost reporting on any level, information about cluster cost, or potential savings. Karpenter doesn’t have any of those.

AWS Karpenter vs. CAST AI – which autoscaler is a better pick for you?

CAST AIKarpenter
Best forBest for
Efficient teams that want to automate Kubernetes cost optimization and gain insights with cost monitoring, fully automating spot instances.Teams running specific workloads like job processing clusters with HPA or other edge cases with low requirements.

As you can see, AWS Karpenter offers a limited set of functionalities right now.

AWS Karpenter is an option if you’re on AWS and considering the native Kubernetes Cluster Autoscaler. Karpenter definitely wins the Karpenter vs. Cluster Autoscaler battle. But today, AWS offers 400+ instances, so node pools on which the Kubernetes Cluster Autoscaler works become unmanageable, considering the exhaustible instance inventory. The same is true for the AWS Cluster Autoscaler found in its Kubernetes managed service, Amazon EKS. This is essentially what Karpenter kind of tries to solve.

If you’re looking for an autoscaler that can significantly reduce your cloud bill for production workloads without impacting their availability, you need to look elsewhere than Karpenter.

CAST AI offers a substantially more robust set of features, making it a great fit for reliable automated cost reduction on production and development workloads. It also works with all the three major cloud providers: AWS, Google Cloud, and Azure.

The platform is also self-healing and self-patching, while Karpenter relies almost entirely on a manual setup. Karpenter needs to be installed and set up manually cluster by cluster, and does no consolidation and optimization at the organization level like CAST AI does. Finally, with CAST AI you get 24/7 support.

To see what results CAST AI can bring you, register and start with a free cluster analysis.

Learn how much you can save

Run a free Savings Report to get savings recommendations

CAST AI Platform
  • Blog
  • AWS Karpenter – Is the AWS Open-Source Kubernetes Autoscaler Right for You?

Leave a reply

Notify of
1 Comment
Inline Feedbacks
View all comments
2021-12-14 6:41 AM

Great article!

Recent posts