Karpenter became the default autoscaler on EKS because it provisions the right instance in seconds. This guide takes you from a working install through the production patterns that determine how Karpenter performs at scale, with working YAML in every section.
Kubernetes clusters are not static. Pods come and go. Traffic shifts. New deployments arrive. The provisioning layer needs to keep up, fast and accurately, without forcing teams to manage instance groups by hand.
Karpenter is the open-source autoscaler that solved this on EKS. It watches pending pods, evaluates NodePool constraints, and launches the right instance type in seconds. AWS ships it as the default scaler in EKS Auto Mode, and roughly 60% of new EKS clusters now provision through it.
That covers the install. The harder part is what comes after: which instance families to allow, how aggressively to consolidate, when to use Spot, how to keep disruption under control, and how to debug a cluster that scales fast but sometimes surprises you with the bill.
This guide takes you from a working Karpenter install on EKS through the advanced patterns that decide whether Karpenter holds up in production. Expect working YAML in every section.
What is Karpenter?
Karpenter is an open-source Kubernetes node provisioner originally built by AWS and donated to the CNCF. It replaces Cluster Autoscaler with a different model. Instead of pre-defining node groups and asking the scheduler to fit pods into them, Karpenter watches unschedulable pods and provisions the optimal node directly from the cloud provider’s instance catalog.
The provisioning loop:
- A pod is scheduled, and the API server cannot place it on existing capacity.
- Karpenter sees the pending pod and reads its resource requests, taints, tolerations, and node selector terms.
- Karpenter evaluates the constraints in your NodePool definitions and the available EC2 instance types.
- Karpenter launches the most efficient instance that satisfies the pod’s needs and your NodePool rules.
- The pod is scheduled to the new node.
That entire loop typically takes 30 to 90 seconds on AWS, compared with several minutes when using Cluster Autoscaler with pre-warmed node groups.
The control surface is two custom resources. NodePool specifies which node types are allowed. EC2NodeClass (formerly AWSNodeTemplate) specifies how Karpenter should launch them on AWS.

Karpenter vs Cluster Autoscaler
The short answer: Karpenter does not require pre-defined Auto Scaling Groups. Cluster Autoscaler scales node groups up and down based on the schedules and instance types you defined ahead of time. Karpenter scales individual nodes and selects instance types at provisioning time, drawing from the full EC2 catalog.
| Dimension | Cluster Autoscaler | Karpenter |
|---|---|---|
| Provisioning model | Scales pre-defined Auto Scaling Groups | Provisions individual nodes from the cloud API |
| Typical scale-up time | 2-5 minutes | 30-90 seconds |
| Instance flexibility | One or a few instance types per node group | Hundreds of instance types evaluated per request |
| Spot diversification | Coarse, via ASG mixed instance policies | Native, across many Spot pools simultaneously |
| Consolidation | Requires separate tooling | First-class, with budgets and policies |
| Drift detection | Not supported | Detects and replaces nodes that no longer match spec |
| Node expiration | Not supported | First-class, configurable per NodePool |
| Configuration unit | Node group (one per workload tier) | NodePool + EC2NodeClass (one set per workload tier) |
| Cloud support | All major clouds, mature | AWS most mature; Azure and Alibaba in active development |

Migration considerations
If you are running Cluster Autoscaler today, the migration to Karpenter is not a flag flip. Existing node groups continue to run; you add Karpenter alongside, route a subset of workloads to it via NodePool requirements and pod-level node selectors, and decommission the node groups gradually. Most teams take 4-8 weeks to fully migrate a production cluster, with the first 2 weeks spent validating Karpenter’s behavior on non-critical workloads.
The migration is real work, but the operational dividends compound. Faster scale-up means tighter HPA loops. Better Spot diversification means higher Spot ratios in production. First-class consolidation means clusters stay right-sized without manual cleanup.
For most modern EKS clusters, Karpenter is the right choice. The exception is environments with strict instance-type requirements (specific compliance configurations, custom-licensed software tied to specific SKUs) where the flexibility Karpenter offers is more constraint than benefit.
Setting up Karpenter on EKS
The setup has three parts: prerequisites, install, and verification.
Prerequisites
You need an EKS cluster with an OIDC provider associated. Karpenter uses IRSA (IAM Roles for Service Accounts) to call the EC2 API. You also need an IAM role for the Karpenter controller and a separate IAM role that the launched nodes will assume.
The fastest path is the official getting-started script in the Karpenter docs, which provisions both roles, the SQS queue for Spot interruption notifications, and the EventBridge rules. If you prefer infrastructure-as-code, the same setup is available as a Terraform module: terraform-aws-modules/eks/aws//modules/karpenter.
Tag your subnets and security groups so Karpenter can discover them:
aws ec2 create-tags \
--resources $SUBNET_IDS $SECURITY_GROUP_IDS \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
Install via Helm
Karpenter ships as a Helm chart in a public ECR registry. Check the latest stable release before pinning a version:
helm registry login --username AWS --password-stdin public.ecr.aws \
$(aws ecr-public get-login-password --region us-east-1)
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "1.0.5" \
--namespace kube-system \
--create-namespace \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--wait
Pin the version. Karpenter follows semver and breaking changes between minor versions are real. Track the release notes and bump deliberately.
Verification
kubectl get pods -n kube-system | grep karpenter
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
You should see two karpenter controller pods running and clean logs. If the controller is crash-looping, the most common cause is missing IAM permissions on the controller role. Compare against the recommended policy in the Karpenter getting-started docs.
Your first NodePool and EC2NodeClass
Karpenter does nothing until you define a NodePool and an EC2NodeClass. Start with the minimum that works, then add constraints.
EC2NodeClass
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023
role: KarpenterNodeRole-${CLUSTER_NAME}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${CLUSTER_NAME}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${CLUSTER_NAME}
amiSelectorTerms:
- alias: al2023@latest
What it does: tells Karpenter to launch nodes with Amazon Linux 2023, into any subnet tagged for discovery, attached to any security group tagged for discovery, using whatever IAM role you set up for nodes.
NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5.2xlarge"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 720h
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
Apply both, then deploy a workload that has resource requests but no existing capacity. Karpenter should provision a node within a minute. Confirm with kubectl get nodes -L karpenter.sh/nodepool and you should see the new node tagged with the NodePool name.
This is the smallest production-safe configuration. It works. It is also restrictive in ways that limit Karpenter’s actual value, which is what the next section addresses.
Tuning NodePools for production
The minimal NodePool above limits Karpenter to three specific instance types. Real production clusters benefit from much more flexibility.
Open up instance flexibility
Replace the explicit instance-type list with category and generation filters:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
- key: karpenter.k8s.aws/instance-cpu
operator: Lt
values: ["33"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
What changed:
- The capacity type now supports Spot first, with on-demand as a fallback when Spot is unavailable.
- Instance category includes compute (c), general (m), and memory (r) families.
- Instance generation must be greater than 4, so Karpenter picks current-gen Graviton and Intel options.
- CPU count capped at 32 to prevent Karpenter from provisioning a giant instance for a tiny workload.
- All three AZs allowed for resilience.
This single set of requirements gives Karpenter access to roughly 100+ instance types across multiple Spot pools.
Set realistic limits
limits:
cpu: 1000
memory: 1000Gi
Limits are the hard ceiling for the NodePool. Set them based on your cost or capacity envelope. A NodePool with no limits will scale until something else stops it.
Use multiple NodePools for different workload tiers
A single NodePool is rarely enough. Most production clusters use two to four NodePools, each with different rules:
- A general-purpose pool for stateless workloads (allows Spot, broad instance flexibility).
- A reserved pool for stateful or latency-sensitive workloads (on-demand only, narrower instance set, taints for isolation).
- A GPU pool for ML or rendering workloads (GPU instance families, taints).
- A burst pool for nightly batch jobs (Spot only, large instances, expires aggressively).
Karpenter selects which NodePool to use for a given pod based on which pool’s requirements the pod can satisfy and which has the highest weight. We will get to weights in the advanced section.

Disruption and consolidation
Disruption is what makes Karpenter feel alive, and it is also what makes platform teams nervous. The four disruption events to know:
Consolidation. Karpenter notices that a workload could fit on a smaller or cheaper node and replaces the existing one. This is where most ongoing optimization happens.
Expiration. Nodes get retired after a fixed duration via expireAfter. Useful for forcing AMI rotation and preventing long-lived nodes from drifting.
Drift. Karpenter detects when a node no longer matches its NodePool spec (for example, after you change the AMI family) and replaces it.
Emptiness. Nodes with no workloads get terminated quickly.

Each can be controlled with disruption budgets:
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
budgets:
- nodes: "20%"
- nodes: "0"
schedule: "0 9 * * mon-fri"
duration: 8h
reasons:
- Drifted
- Underutilized
What this does:
- Consolidation runs when nodes are empty or underutilized, with a 1 minute hold before any action.
- At any given time, no more than 20% of the NodePool’s nodes can be disrupted simultaneously.
- During business hours (9 AM Monday through Friday, lasting 8 hours), drift and consolidation are blocked entirely. Empty node termination still applies.
Disruption budgets are the most important production guardrail Karpenter offers. Configure them deliberately. If you do not set budgets, Karpenter defaults to disrupting up to 10% of nodes at a time, which is reasonable for general workloads but too aggressive for clusters with single-replica services or strict PodDisruptionBudgets.
Consolidation policies
Two consolidation policies are available:
WhenEmpty: Karpenter only terminates nodes that are completely empty. Safe and conservative.WhenEmptyOrUnderutilized: Karpenter actively replaces underutilized nodes with smaller ones. Higher savings, more disruption.
Most teams start with WhenEmpty and graduate to WhenEmptyOrUnderutilized once they trust their PodDisruptionBudgets and have validated that critical workloads handle eviction gracefully.
Spot instance strategy
Spot is the largest single cost lever Karpenter unlocks. It is also the largest source of operational risk if you handle it reactively.
Diversify across Spot pools
Karpenter natively diversifies across Spot pools when you allow multiple instance families and zones in a NodePool:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
The more instance types and zones a NodePool can use, the more Spot pools Karpenter draws from, and the lower the chance of a correlated mass interruption.
Fall back to on-demand
A safer pattern is to allow both capacity types in the same NodePool:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
Karpenter prefers Spot when available and falls back to on-demand when Spot is constrained.
Handle interruption notices
Karpenter listens for AWS Spot interruption notices via the SQS queue you configure in the controller settings. When AWS sends a 2-minute interruption warning, Karpenter cordons the node, drains the workloads, and provisions a replacement.
That 2-minute window is short. Workloads that take longer than 2 minutes to terminate gracefully will be killed mid-shutdown. Mitigate by:
- Setting
terminationGracePeriodSecondson stateful pods. - Using PodDisruptionBudgets to prevent simultaneous interruptions.
- Avoiding Spot for workloads that cannot tolerate interruption.
This is also where Karpenter alone has a ceiling. Karpenter reacts to AWS’s 2-minute warning. It does not predict interruptions earlier.
Advanced patterns
A few configurations that show up in mature Karpenter deployments.
Weighted NodePools for tier prioritization
When multiple NodePools could host a pod, Karpenter picks the one with the highest weight. This is how teams express preference for reserved capacity over Spot, or for one instance family over another:
# Pool A: prefer reserved capacity
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: reserved
spec:
weight: 100
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["reserved"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
---
# Pool B: fall back to general spot/on-demand
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general
spec:
weight: 10
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
Pods land on reserved first if it has capacity, then fall through to general.
GPU NodePools with taints
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
spec:
template:
spec:
taints:
- key: nvidia.com/gpu
effect: NoSchedule
requirements:
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["g5", "g6"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: gpu-nodeclass
limits:
"nvidia.com/gpu": 100
The taint forces GPU workloads to explicitly tolerate the pool, so non-GPU workloads do not accidentally land on expensive GPU nodes.
Custom AMIs
For organizations with hardened AMIs or custom user data, point EC2NodeClass at your AMI:
spec:
amiFamily: Custom
amiSelectorTerms:
- tags:
Name: my-org-eks-worker-ami
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh ${CLUSTER_NAME}
# custom hardening here
Use the Custom AMI family when you need user-data control beyond what the managed AMI families allow.
Day-2 operations
What changes once Karpenter is provisioning real production traffic.
Observability
Karpenter exposes metrics on :8080/metrics in Prometheus format. The metrics worth alerting on:
karpenter_nodes_terminated: spike indicates aggressive consolidation or interruption events.karpenter_pods_state{phase="pending"}: pods Karpenter could not provision for. Indicates NodePool constraints are too tight.karpenter_disruption_evaluation_duration_seconds: how long disruption decisions take. Slow numbers indicate scaling problems.
Pair with logs. Karpenter logs every provisioning decision, every consolidation event, and every disruption with the workload that triggered it.
Troubleshooting
Two commands solve most issues:
kubectl describe pod <pending-pod>
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=200
The pod’s events tell you why scheduling failed. Karpenter’s logs tell you why provisioning failed. The most common production issues are NodePool requirement mismatches, exhausted limits, and IAM permission errors during instance launch.
AMI rollouts
When a new AL2023 AMI ships and you want to roll it out, update the EC2NodeClass alias and let drift detection do the work:
amiSelectorTerms:
- alias: al2023@v20251201
Karpenter detects the drift and replaces existing nodes within the disruption budgets you have configured. For staged rollouts, use a separate EC2NodeClass and migrate one NodePool at a time.
Karpenter on Azure and GCP
Karpenter started on AWS and is most mature there. The CNCF project includes a provider abstraction, and active providers exist for Azure (AKS) and Alibaba Cloud. GKE has its own auto-provisioning that is conceptually similar but is not Karpenter.
If you are running on AKS, the Karpenter provider for Azure is in active development. Expect feature parity with the AWS provider to lag by a release or two.
For GKE, GCP’s Node Auto Provisioning (NAP) covers similar ground with a different control plane.
The patterns in this guide apply broadly, but the specifics (provider configuration, IAM equivalents, AMI vs image references) need to be translated per cloud.
Where Karpenter ends, and the operational layer begins
Karpenter’s job is provisioning. It does that well. What it does not do, by design, is the operational layer above provisioning:
- It does not show cost attribution by workload, namespace, or team.
- It does not adjust pod resource requests as workloads evolve, so it provisions against whatever requests are set.
- It does not predict Spot interruptions earlier than AWS’s 2-minute warning.
- It does not surface why the cluster scaled at 2:14 AM in a single view.
- It does not consolidate stateful workloads without restarts.
Most platform teams running Karpenter at scale eventually build that layer themselves: a Grafana stack for visibility, custom Prometheus queries for Spot ratio, a side process for resource-request tuning, runbooks for AMI rotations.
That layer can also be bought.
Cast AI for Karpenter runs alongside Karpenter and adds the parts Karpenter does not ship. Cost attribution by workload and team. Continuous workload rightsizing. Spot interruption prediction earlier than the standard 2-minute notice. Container Live Migration for moving stateful workloads between nodes without restart. Karpenter continues to provision nodes; Cast AI handles the operational layer above it.
If you have spent the time tuning NodePools, configuring disruption budgets, and building dashboards, the next step is the operational layer that turns Karpenter from a powerful provisioner into a fully operated platform.
Explore Cast AI for Karpenter →
Frequently Asked Questions About Karpenter
KS Auto Mode is built on Karpenter. AWS runs the controller for you and ships a default NodePool configuration. If you are running Auto Mode, you are already running Karpenter, just without direct access to the controller pod or the NodePool definitions. Self-managed Karpenter on a standard EKS cluster gives you full control over both.
Karpenter is a CNCF project with provider abstraction. The AWS provider is the most mature. The Azure provider for AKS is in active development with feature parity lagging by a release or two. Alibaba Cloud also has an active provider. GKE has its own auto-provisioning that is conceptually similar but is not Karpenter.
WhenEmpty terminates only nodes with no running workloads. WhenEmptyOrUnderutilized actively replaces underutilized nodes with smaller ones to consolidate capacity. The first is conservative and safe by default. The second is more aggressive and saves more, but only after you have validated that workloads handle eviction gracefully and PodDisruptionBudgets are configured correctly.
Yes. Set kubernetes.io/arch in the NodePool requirements to include arm64 , and Karpenter will provision Graviton instances when they are the best fit. For mixed clusters, allow both amd64 and arm64. Workloads that require a specific architecture should be pinned via a node selector.
Yes, during migration. Each manages its own node groups or NodePools, and they do not interfere as long as workloads route to one or the other via labels and selectors. This is the recommended pattern for production migration: stand up Karpenter alongside CA, move workloads tier by tier, and decommission node groups as they drain.
The Karpenter controller needs permission to launch and terminate EC2 instances, describe instance types, manage SSM parameters for AMI lookup, and read the SQS queue for Spot interruption notices. The launched nodes need a separate IAM role with the standard worker-node permissions (ECR pull, CNI, EBS volume management). The official getting-started guide ships both policies.
Karpenter respects Kubernetes pod priority and preemption. Higher-priority pods can preempt lower-priority pods on existing nodes. For provisioning, Karpenter does not currently prioritize provisioning for higher-priority pending pods over lower-priority ones in the same scheduling pass; both are evaluated together.
Either. Most production clusters running Karpenter at scale eventually decommission the Cluster Autoscaler entirely because Karpenter handles the same scale-up, consolidation, drift, and node expiration. During migration, the two run side by side; in the long term, Karpenter is enough.



