Multi-Node Clusters

A multi-node Kubernetes cluster is a cluster that consists of at least one or more ControlPlane nodes and one or more Worker nodes. Control Plane Node(s) responsible for managing the cluster’s lifecycle and schedules workloads. Worker nodes runs the actual applications (Pods).

Kubernetes Cluster Components Source: https://kubernetes.io/docs/concepts/architecture/

Workloads Scheduling

By default, Kubernetes does not schedule Pods on control plane nodes for regular workloads. This is to ensure the control plane remains dedicated to managing the cluster and is not burdened with running application workloads. This only true if the node has this taints.

Taints: node-role.kubernetes.io/control-plane:NoSchedule

Lets try it with minikube, first we need to start a new cluster with 3 node using this command below.

➜ minikube start --nodes 3 -p multinode 

😄  [multinode] minikube v1.34.0 on Darwin 15.3.1 (arm64)
✨  Automatically selected the docker driver
📌  Using Docker Desktop driver with root privileges
👍  Starting "multinode" primary control-plane node in "multinode" cluster
🚜  Pulling base image v0.0.45 ...
🔥  Creating docker container (CPUs=2, Memory=2200MB) ...
🐳  Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

👍  Starting "multinode-m02" worker node in "multinode" cluster
🚜  Pulling base image v0.0.45 ...
🔥  Creating docker container (CPUs=2, Memory=2200MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.49.2
🐳  Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
    ▪ env NO_PROXY=192.168.49.2
🔎  Verifying Kubernetes components...

👍  Starting "multinode-m03" worker node in "multinode" cluster
🚜  Pulling base image v0.0.45 ...
🔥  Creating docker container (CPUs=2, Memory=2200MB) ...
🌐  Found network options:
    ▪ NO_PROXY=192.168.49.2,192.168.49.3
🐳  Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
    ▪ env NO_PROXY=192.168.49.2
    ▪ env NO_PROXY=192.168.49.2,192.168.49.3
🔎  Verifying Kubernetes components...
🏄  Done! kubectl is now configured to use "multinode" cluster and "default" namespace by default

This will start a Kubernetes cluster with 3 nodes, 1 control plane node and 2 worker nodes.

➜ kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
multinode       Ready    control-plane   48s   v1.31.0
multinode-m02   Ready    <none>          35s   v1.31.0
multinode-m03   Ready    <none>          25s   v1.31.0

Then lets create a simple deployment using nginx image with 3 replicas and apply it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

➜ kubectl apply -f deployment.yaml 
deployment.apps/nginx-deployment created

➜ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP           NODE            NOMINATED NODE   READINESS GATES
nginx-deployment-54b9c68f67-fq928   1/1     Running   0          4m56s   10.244.1.2   multinode-m02   <none>           <none>
nginx-deployment-54b9c68f67-t85zh   1/1     Running   0          4m56s   10.244.2.2   multinode-m03   <none>           <none>
nginx-deployment-54b9c68f67-wvzng   1/1     Running   0          4m56s   10.244.0.3   multinode       <none>           <none>

As we can see above the pods is spread evenly across all nodes, including control plane node which should not happened. This is because minikube doesn't add the NoSchedule taint by default.

➜ kubectl describe nodes | grep Taint
Taints:             <none>
Taints:             <none>
Taints:             <none>

If no NoSchedule taint exists on the control-plane node, it can accept Pods. With 3 replicas and 3 nodes, the scheduler evenly distributes Pods. To prevent this we can manually add the taint using this command below.

➜ kubectl taint nodes multinode node-role.kubernetes.io/control-plane:NoSchedule
node/multinode tainted

➜ kubectl describe nodes multinode | grep Taint
Taints:             node-role.kubernetes.io/control-plane:NoSchedule

Then delete the pods that scheduled in the control plane node.

➜ kubectl delete pod nginx-deployment-54b9c68f67-wvzng 
pod "nginx-deployment-54b9c68f67-wvzng" deleted

Check the pods list again and you should see the new pods is not scheduled in the control plane node.

➜ kubectl get pods -o wide                            
NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
nginx-deployment-54b9c68f67-fq928   1/1     Running   0          22m   10.244.1.2   multinode-m02   <none>           <none>
nginx-deployment-54b9c68f67-hd6bq   1/1     Running   0          14s   10.244.2.3   multinode-m03   <none>           <none>
nginx-deployment-54b9c68f67-t85zh   1/1     Running   0          22m   10.244.2.2   multinode-m03   <none>           <none>

Node Specific Deployment

Eventually we want to deploy or place a Pod into specific node due to several reason. For example we want to deploy database service in EU region for GDPR compliance, Put the observability services in different nodes to improve reliability, Isolate CPU intensive workload in different node, etc.

Kubernetes provides mechanisms like nodeSelector, nodeAffinity, and taints and tolerations to control pod placement.

`nodeSelector`

nodeSelector is the simplest recommended way to deploy / place Pod into specific node. We can add nodeSelector field to Pod specification and specify the node labels that we want to target. Make sure that you label the node properly. Kubernetes only schedules the Pod into nodes that have each of the labels we specify.

First lets add a label to node multinode-m02 so we can use it as node selector.

➜ kubectl label nodes multinode-m02 node-type=infra
node/multinode-m02 labeled

Then update our deployment file to add nodeSelector field inside template.spec. This will tell kubernetes scheduler to schedule the pod to node that have the same label as specified in node selector.

nodeSelector:
  node-type: infra

Re-apply the yaml file and check the pods list again. You should see that all the pods run in multinode-m02 node.

➜ kubectl apply -f deployment.yaml
deployment.apps/nginx-deployment configured

➜ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
nginx-deployment-5579449f87-8c26w   1/1     Running   0          23s   10.244.1.3   multinode-m02   <none>           <none>
nginx-deployment-5579449f87-cjbbf   1/1     Running   0          16s   10.244.1.5   multinode-m02   <none>           <none>
nginx-deployment-5579449f87-kdk45   1/1     Running   0          19s   10.244.1.4   multinode-m02   <none>           <none>

`nodeAffinity`

Similar like node selector, nodeAffinity also provide constraints on which node the Pod can be scheduled. The difference is that node affinity can have more complex rules (required and preferred).

requiredDuringSchedulingIgnoredDuringExecution: The scheduler can't schedule the Pod unless the rule is met (Hard Requirements).
preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod (Soft Requirements).

This time lets add a label to node multinode-m03 so we can use it for affinity rule.

➜ kubectl label nodes multinode-m03 node-type=product
node/multinode-m03 labeled

Add the following node affinity configuration in template.spec field. Don't forget to remove the previously added node selector.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: node-type
              operator: In
              values:
                - product

Re-apply the yaml file and check the pods list again. You should see that all the pods run in multinode-m03 node.

➜ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP           NODE            NOMINATED NODE   READINESS GATES
nginx-deployment-78b49dbd66-6nkgj   1/1     Running   0          13s   10.244.2.4   multinode-m03   <none>           <none>
nginx-deployment-78b49dbd66-brdvt   1/1     Running   0          5s    10.244.2.6   multinode-m03   <none>           <none>
nginx-deployment-78b49dbd66-j27td   1/1     Running   0          9s    10.244.2.5   multinode-m03   <none>           <none>

When both nodeSelector and nodeAffinity with required rule presents, kubernetes scheduler will always try to satisfy both requirements. If no nodes match both requirements the pod will stuck in Pending status. If we check the events there will be Warning: FailedScheduling with message like below.

32s         Warning   FailedScheduling    Pod/nginx-deployment-785475985d-vdv8q    0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

Workloads Scheduling​

Node Specific Deployment​

nodeSelector​

nodeAffinity​

References​

Workloads Scheduling

Node Specific Deployment

`nodeSelector`

`nodeAffinity`

References