Kubernetes Cordon: How It Works And When To Use It 

Kubernetes cordon lets you mark a node as unschedulable, preventing new pods from being placed on it while keeping existing workloads running. In this guide, we walk through when and how to use cordon and drain—plus how to automate the process to save time and cut cloud costs.

Laurent Gil Avatar
Kubernetes Cordon_ How It Works and When to Use It

Kubernetes allows you to administer and maintain nodes manually using kubectl. This command lets you easily create a node object and modify nodes; for example, set a label or mark them unschedulable. This is where the Kubernetes cordon command can be helpful. 

What is the Kubernetes cordon command, and how do you use it? This article dives into Kubernetes cordon with practical examples to show you how node cordoning and draining work, and how you can do both automatically to score cost savings.

What is Kubernetes cordon?

Kubernetes cordon is an operation that marks or taints a node in your existing node pool as unschedulable. By using it on a node, you can be sure that no new pods will be scheduled for it. The command prevents the Kubernetes scheduler from placing new pods on that node, but it doesn’t affect existing pods on that node. 

To mark a node unschedulable, all it takes is running this command:

kubectl cordon $NODENAME

Note that you can run pods that are part of a DaemonSet on an unschedulable node. That’s because DaemonSets usually provide node-local services that should be running on the node, even if it’s marked as unschedulable and drained of workloads.

When do you use the Kubernetes cordon command?

Nodes in a Kubernetes cluster may require maintenance—replacing hardware components in a self-hosted installation, updating the node’s kernel, or resizing its compute resources from a cloud provider.

Kubernetes cordon and drain prepare your application for node downtime by letting workloads running on a target node get rescheduled to other nodes. You can then safely shut the node down and remove it from your cluster, confident that the action doesn’t impact service availability in any way.

How does Kubernetes cordon work: step-by-step guide

Step 1: Cordoning a node

By cordoning a node, you mark it as unavailable to the scheduler. This action makes the node not eligible to host any new pods that will be added to your cluster.

To place a cordon around a named node, use this command:

$ kubectl cordon node-12

The existing pods on this node won’t be affected by this and will remain accessible. 

Want to check which nodes are cordoned off at the moment? 

Use the get nodes command:

$ kubectl get nodes

NAME    STATUS                                         ROLES                              AGE   VERSION

node-12 Ready,SchedulingDisabled   control-plane,master  22m   v1.23.3

All the cordoned nodes will appear with the status SchedulingDisabled.

Step 2: Draining a node

To empty the node of the remaining pods, you need to drain it. This process evicts the pods so they can be rescheduled to other nodes in your cluster. You can gracefully terminate pods before forcing them to be removed from the node you’re planning to drain.

To start the drain procedure, run the kubectl drain command, specifying the name of the node you’re draining:

$ kubectl drain node-12

node/node-12 already cordoned

evicting pod kube-system/storage-provisioner

evicting pod default/nginx-7c658794b9-zszdd

evicting pod kube-system/coredns-64897985d-dp6lx

pod/storage-provisioner evicted

pod/nginx-7c658794b9-zszdd evicted

pod/coredns-64897985d-dp6lx evicted

node/node-12 evicted

Note: In practice, draining without ignoring daemonsets hardly happens, as illustrated here.

How does the drain procedure work? 

First, we have Kubernetes cordon, which cordons the node for you if you haven’t done that manually. It then evicts running pods, even if there isn’t room for them to be rescheduled. Once the node is empty, you can shut it down or destroy it. The cordon command ensures that no new workloads can be scheduled on this node once the drain is completed.

What happens if your pods have long grace periods?

Kubernetes cordon and the resulting node draining might take a while if your pods happen to have long grace periods. This might become a problem if you’re looking to take a node offline quickly. 

You can always use the –grace-period flag to override your pod’s termination period and force an instant eviction:

$ kubectl drain node-12 --grace-period 0

Use this command carefully. Some workloads might not respond well when you force them to stop without giving them a chance to clean up.

Another command you should know aside from Kubernetes cordon is one for draining nodes, even if there are no pods managed by a ReplicationController, Job, or DaemonSet on it:

$ kubectl deain node-12 --force

How do you undrain or uncordon a node in Kubernetes?

It’s possible to undrain or uncordon a node in Kubernetes by marking this node as schedulable. 

Use the following command:

kubectl uncordon NODE

Using Kubernetes cordon for cost optimization

Suppose you analyze your cluster with a third-party optimization solution and discover a tremendous potential for cost savings. But to get there, you need to change the compute resources your cluster is running on. This is where Kubernetes taint comes into play.

Once you install Cast AI using this guide and connect your cluster, you can set the Evictor policy.

Your next step is to remove your existing nodes and let Cast AI create an optimized cluster. Note that, depending on your workload configuration, this might cause some downtime. Evictor is designed to avoid downtime, but some settings can be overwritten to make it more aggressive.

This is what Evictor looks like in action:

Nodes status
  1. One node (in red below) is identified as a candidate for eviction.
  2. Evictor automatically moves the pods to other nodes – this is called bin-packing.
  3. Once the node becomes empty, it’s deleted from the cluster.
  4. Go back to step 1.

One node is deleted:

Reducing number of nodes

At the end of the process, three nodes remain:

Active nodes after optimization

In 10 minutes, Evictor deleted three nodes and left three nodes running to optimize the setup and cut costs.

Automate Kubernetes node cordoning and draining

Changing the compute resources your nodes run on manually quickly becomes time-consuming if you have many nodes. Cast AI can do the job and do the node cordoning and draining for you as part of its automated optimization mechanisms.

Connect your cluster in a read-only mode to find out how much you could save and how to get there with free Kubernetes cost monitoring.

Cast AIBlogKubernetes Cordon: How It Works And When To Use It