The Kubernetes scheduler decides where to schedule your pods on its own, and its decisions may not necessarily align with yours. Sometimes, you can live with that. But in some cases, this uncertainty may lead to degraded (or less optimal) performance. But that’s not the end of it. You might also see increased costs and availability problems.
Let’s dive into all those problems and find out how to solve them using inter-pod affinity and anti-affinity!
Before we begin, let’s briefly check what inter-pod affinity and anti-affinity are and what their structures are.
Inter-pod affinity
Affinity refers to attraction. So, inter-pod affinity means that the pod wants to be on the same topology as the matching pod.
Here’s the pod spec for affinity:
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web
topologyKey: topology.kubernetes.io/zone
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: topology.kubernetes.io/zone
Now let’s go over the details of the inter-pod affinity structure and how it impacts the Kubernetes scheduler:
requiredDuringSchedulingIgnoredDuringExecution
– it means that conditions must be satisfied for the pod to be scheduled. This is also called a hard requirement.preferredDuringSchedulingIgnoredDuringExecution
– if a condition can be satisfied, it will be satisfied. But if not, it will be ignored. This is also called a soft requirement.podAffinityTerm
– the pod affinity term defines which pods we select with a label selector and which node topology key we target.- A soft requirement has
podAffinityTerm
as separate property with an additionalweight
parameter that defines which term is more important. - A hard requirement has an affinity term as a root list item object. For the hard affinity rule, all affinity terms and all expressions should be satisfied for the pod to be scheduled.
- A soft requirement has
Inter-pod affinity
requiredDuringSchedulingIgnoredDuringExecution
terms and expressions are ANDed. This means that everything should be satisfied for the pod to be scheduled, that makes the rule a very strict requirement when several condition are used.It’s easy to mix it up with node affinity, which is not that strict (terms are ORed and expressions are ANDed – the first term should be satisfied for the pod to be scheduled)
Inter-pod anti-affinity
Inter-pod anti-affinity is the opposite of affinity. It means pods don’t want to be on the same topology as their matching pods.
Here’s what the pod spec looks like:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
Let’s dive into the inter-pod anti-affinity structure:
requiredDuringSchedulingIgnoredDuringExecution
– it means that conditions must be satisfied to pod be scheduled (hard requirement).preferredDuringSchedulingIgnoredDuringExecution
– if a condition can be satisfied, it will be satisfied. But if not, it will be ignored (soft requirement).podAffinityTerm
– the pod affinity term defines which pods we select with a label selector and which node topology key we target.- A soft requirement has
podAffinityTerm
as separate property with an additionalweight
parameter that defines which term is more important. - A hard requirement has an affinity term as a root list item object. For the hard affinity rule, all affinity terms and all expressions should be satisfied for the pod to be scheduled.
- A soft requirement has
Degraded performance problem – and how to solve it with an anti-affinity rule
When running various workloads on your Kubernetes cluster, different workloads may depend on different resources. Some workloads might be CPU-heavy or memory-heavy – the following could be easily controlled by specifying the correct container resources.
However, workloads could heavily use the network or attached disks that you can’t control directly. When you fail to add restrictions for the Kubernetes scheduler in an unfortunate event, several workloads using the disk or network heavily might land on the same node and cause network or disk overloading.
As a result, you’ll see degraded performance on disk- or network-dependent workloads when the node’s network bandwidth is reached. You can control this or at least lower the risk of hitting these limits to a minimum by using pod labeling and adding pod anti-affinity.
Let’s say we have several workloads that are using the network heavily. We could label all those workloads with some custom label like 'network-usage':'high'
and define the pod anti-affinity rule on this workload:
apiVersion: v1
kind: Pod
metadata:
name: network-heavy-app
labels:
network-usage: high
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: network-usage
operator: In
values:
- high
topologyKey: kubernetes.io/hostname
containers:
- name: network-heavy-app
image: registry.k8s.io/pause:2.0
By defining the label on a pod, we mark it as a high-network user. We can identify all pods that heavily use a network using that label.
The pod anti-affinity rule is used here to prevent pods with the label 'network-usage':'high'
to be scheduled on the same node(topologyKey: kubernetes.io/hostname
), isolating pods from each other on different nodes.
High availability problem – and how to solve it with anti-affinity
Sometimes, the Kubernetes scheduler might schedule the same workload replicas on the same node.
That creates a high availability problem – if nodes go down, all or portion of workload replicas goes down, and that can create partial or full downtime of the application.
You can solve this problem using pod anti-affinity by targeting the application name and using the hostname topology key:
apiVersion: apps/v1
kind: Deployment
metadata:
name: highly-available-app
labels:
app: highly-available-app
spec:
replicas: 10
selector:
matchLabels:
app: highly-available-app
template:
metadata:
labels:
app: highly-available-app
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- highly-available-app
topologyKey: kubernetes.io/hostname
containers:
- name: highly-available-app
image: registry.k8s.io/pause:2.0
The example above defines a deployment where each highly-available-app
replica can only be scheduled on a separate node.
Cost problem – reduce your network costs using an affinity rule
A Kubernetes cluster cost consists of the VM (CPU and RAM) price, storage price, network price, and Kubernetes-as-a-service price.
Let’s talk about the network price. Usually, cloud providers charge for a network bandwidth that leaves an availability zone. This means that network traffic between pods that are running on different availability zones is paid!
You can’t eliminate outer-zone traffic by 100%, but you can still reduce costs significantly by placing heavy communicating pods in the same availability zone. You can use inter-pod affinity in the zone topology key to achieving that:
apiVersion: v1
kind: Pod
metadata:
name: web
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: topology.kubernetes.io/zone
containers:
- name: web
image: registry.k8s.io/pause:2.0
The example above defined a pod that can only be scheduled on the same zone as pod matching app=backend
. Having this affinity requirement could decrease network costs between web
and backend
pods.
Wrap up
Modifying the decision-making process of the Kubernetes scheduler using affinity and anti-affinity is a smart move. You can use such rules to avoid all kinds of problems that arise when pods are scheduled by default, without these smart rules in place.
Affinities use labels for selecting targets, and it’s important to create a good labeling strategy for your Kubernetes ecosystem. Check out this post for essential labeling best practices: Kubernetes Labels: Expert Guide with 10 Best Practices