Multi-Node Clusters
A multi-node Kubernetes cluster is a cluster that consists of at least one or more ControlPlane nodes and one or more Worker nodes. Control Plane Node(s) responsible for managing the cluster’s lifecycle and schedules workloads. Worker nodes runs the actual applications (Pods).
Source: https://kubernetes.io/docs/concepts/architecture/
Workloads Scheduling
By default, Kubernetes does not schedule Pods on control plane nodes for regular workloads. This is to ensure the control plane remains dedicated to managing the cluster and is not burdened with running application workloads. This only true if the node has this taints.
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Lets try it with minikube, first we need to start a new cluster with 3 node using this command below.
➜ minikube start --nodes 3 -p multinode
😄 [multinode] minikube v1.34.0 on Darwin 15.3.1 (arm64)
✨ Automatically selected the docker driver
📌 Using Docker Desktop driver with root privileges
👍 Starting "multinode" primary control-plane node in "multinode" cluster
🚜 Pulling base image v0.0.45 ...
🔥 Creating docker container (CPUs=2, Memory=2200MB) ...
🐳 Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔗 Configuring CNI (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
👍 Starting "multinode-m02" worker node in "multinode" cluster
🚜 Pulling base image v0.0.45 ...
🔥 Creating docker container (CPUs=2, Memory=2200MB) ...
🌐 Found network options:
▪ NO_PROXY=192.168.49.2
🐳 Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
▪ env NO_PROXY=192.168.49.2
🔎 Verifying Kubernetes components...
👍 Starting "multinode-m03" worker node in "multinode" cluster
🚜 Pulling base image v0.0.45 ...
🔥 Creating docker container (CPUs=2, Memory=2200MB) ...
🌐 Found network options:
▪ NO_PROXY=192.168.49.2,192.168.49.3
🐳 Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
▪ env NO_PROXY=192.168.49.2
▪ env NO_PROXY=192.168.49.2,192.168.49.3
🔎 Verifying Kubernetes components...
🏄 Done! kubectl is now configured to use "multinode" cluster and "default" namespace by default
This will start a Kubernetes cluster with 3 nodes, 1 control plane node and 2 worker nodes.
➜ kubectl get nodes
NAME STATUS ROLES AGE VERSION
multinode Ready control-plane 48s v1.31.0
multinode-m02 Ready <none> 35s v1.31.0
multinode-m03 Ready <none> 25s v1.31.0
Then lets create a simple deployment using nginx
image with 3 replicas and apply it.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
➜ kubectl apply -f deployment.yaml
deployment.apps/nginx-deployment created
➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-54b9c68f67-fq928 1/1 Running 0 4m56s 10.244.1.2 multinode-m02 <none> <none>
nginx-deployment-54b9c68f67-t85zh 1/1 Running 0 4m56s 10.244.2.2 multinode-m03 <none> <none>
nginx-deployment-54b9c68f67-wvzng 1/1 Running 0 4m56s 10.244.0.3 multinode <none> <none>
As we can see above the pods is spread evenly across all nodes, including control plane node which should not happened. This is because minikube doesn't add the NoSchedule
taint by default.
➜ kubectl describe nodes | grep Taint
Taints: <none>
Taints: <none>
Taints: <none>
If no NoSchedule
taint exists on the control-plane node, it can accept Pods. With 3 replicas and 3 nodes, the scheduler evenly distributes Pods. To prevent this we can manually add the taint using this command below.
➜ kubectl taint nodes multinode node-role.kubernetes.io/control-plane:NoSchedule
node/multinode tainted
➜ kubectl describe nodes multinode | grep Taint
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Then delete the pods that scheduled in the control plane node.
➜ kubectl delete pod nginx-deployment-54b9c68f67-wvzng
pod "nginx-deployment-54b9c68f67-wvzng" deleted
Check the pods list again and you should see the new pods is not scheduled in the control plane node.
➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-54b9c68f67-fq928 1/1 Running 0 22m 10.244.1.2 multinode-m02 <none> <none>
nginx-deployment-54b9c68f67-hd6bq 1/1 Running 0 14s 10.244.2.3 multinode-m03 <none> <none>
nginx-deployment-54b9c68f67-t85zh 1/1 Running 0 22m 10.244.2.2 multinode-m03 <none> <none>
Node Specific Deployment
Eventually we want to deploy or place a Pod into specific node due to several reason. For example we want to deploy database service in EU region for GDPR compliance, Put the observability services in different nodes to improve reliability, Isolate CPU intensive workload in different node, etc.
Kubernetes provides mechanisms like nodeSelector
, nodeAffinity
, and taints and tolerations to control pod placement.
nodeSelector
nodeSelector
is the simplest recommended way to deploy / place Pod into specific node. We can add nodeSelector
field to Pod specification and specify the node labels that we want to target. Make sure that you label the node properly. Kubernetes only schedules the Pod into nodes that have each of the labels we specify.
First lets add a label to node multinode-m02
so we can use it as node selector.
➜ kubectl label nodes multinode-m02 node-type=infra
node/multinode-m02 labeled
Then update our deployment file to add nodeSelector
field inside template.spec
. This will tell kubernetes scheduler to schedule the pod to node that have the same label as specified in node selector.
nodeSelector:
node-type: infra
Re-apply the yaml file and check the pods list again. You should see that all the pods run in multinode-m02
node.
➜ kubectl apply -f deployment.yaml
deployment.apps/nginx-deployment configured
➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5579449f87-8c26w 1/1 Running 0 23s 10.244.1.3 multinode-m02 <none> <none>
nginx-deployment-5579449f87-cjbbf 1/1 Running 0 16s 10.244.1.5 multinode-m02 <none> <none>
nginx-deployment-5579449f87-kdk45 1/1 Running 0 19s 10.244.1.4 multinode-m02 <none> <none>
nodeAffinity
Similar like node selector, nodeAffinity
also provide constraints on which node the Pod can be scheduled. The difference is that node affinity can have more complex rules (required and preferred).
requiredDuringSchedulingIgnoredDuringExecution
: The scheduler can't schedule the Pod unless the rule is met (Hard Requirements).preferredDuringSchedulingIgnoredDuringExecution
: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod (Soft Requirements).
This time lets add a label to node multinode-m03
so we can use it for affinity rule.
➜ kubectl label nodes multinode-m03 node-type=product
node/multinode-m03 labeled
Add the following node affinity configuration in template.spec
field. Don't forget to remove the previously added node selector.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- product
Re-apply the yaml file and check the pods list again. You should see that all the pods run in multinode-m03
node.
➜ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-78b49dbd66-6nkgj 1/1 Running 0 13s 10.244.2.4 multinode-m03 <none> <none>
nginx-deployment-78b49dbd66-brdvt 1/1 Running 0 5s 10.244.2.6 multinode-m03 <none> <none>
nginx-deployment-78b49dbd66-j27td 1/1 Running 0 9s 10.244.2.5 multinode-m03 <none> <none>
When both nodeSelector
and nodeAffinity
with required rule presents, kubernetes scheduler will always try to satisfy both requirements. If no nodes match both requirements the pod will stuck in Pending status. If we check the events there will be Warning: FailedScheduling with message like below.
32s Warning FailedScheduling Pod/nginx-deployment-785475985d-vdv8q 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.