Scaling: Horizontal Pod Autoscaler

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.

Horizontal scaling means that the response to increased load is to deploy more Pods. This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.

If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet, or other similar resource) to scale back down.

Update our App

Lets update our app first to return hashed message using SHA256 to make some additional CPU load. Add new field called Hash in our response definition.

type Response struct {
    Message string `json:"message,omitempty"`
    Hash    string `json:"hash,omitempty"`
}

Add the hashed message into the Hash response field and then rebuild the apps.

hasher := sha256.New()
hasher.Write([]byte(message))
hash := fmt.Sprintf("%x", hasher.Sum(nil))

res := Response{
    Message: message,
    Hash:    hash,
}

Lets deploy the newly builded app into our kubernetes cluster. We can do this using rollout restart command. Kubernetes will spin up new pods and removing the old pods as the new one ready to receive traffic. This ensure there is no down time to the service when we deploying new version of the app.

➜ kubectl rollout restart deployment/simple-go
deployment.apps/simple-go restarted

We can check using get pod command and we should see new sets of pods running.

Define HPA

Next lets define the horizontal pod autoscaler, we can create new file hpa.yaml and put below definition there.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: simple-go-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: simple-go
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15

`spec.scaleTargetRef`

In this section we define which object that the autoscaler should target. We choose a deployment object called simple-go as the target. The controller manager then selects the pods based on the target resource's .spec.selector labels (which in our case it's app: simple-go), and obtains the metrics from either the resource metrics API.

`spec.minReplicas`

Define minimum number of Pods.

`spec.maxReplicas`

Define maximum number of Pods.

`spec.metrics`

In this section we define what metrics should the controller manager look. We chose cpu as the resource with target type: Utilization and averageUtilization: 50. It means that if the pods average CPU utilization above 50%, the controller manager will spin up new pods.

The controller manager will keep spin up new pods until the average utilization below specified value or maxReplicas reached.

If the average utilization below the specified value for certain amount of time (defaulted to 300s). The controller manager will deleting / removing pods until the minReplicas reached.

`spec.behavior`

We can also define how the scaling behavior. The default behavior can be read here: HPA: Default Behavior.

We define here that the scaleDown behavior stabilizationWindowSeconds: 120, that means the controller manager need to wait for 120 seconds or 2 minutes for the metrics to stabilize before deleting some pods.

    policies:
      - type: Pods
        value: 1
        periodSeconds: 15

Policies here means that the controller manager can remove 1 pods for every 15s if the metrics is stabilized.

Apply and Validate

Lets apply our HPA using apply command and validate using get horizontalpodautoscalers.autoscaling command.

➜ kubectl apply -f hpa.yaml 
horizontalpodautoscaler.autoscaling/simple-go-hpa create

➜ kubectl get horizontalpodautoscalers.autoscaling
NAME            REFERENCE              TARGETS                         
simple-go-hpa   Deployment/simple-go   cpu: 10%/50%  

MINPODS   MAXPODS   REPLICAS   AGE
3         10        3          26s

Load Testing

To test wether our HPA work as expected we can create a load test. This load test will simulate increased traffic into our servce.

We use k6 to do load testing, you can download and install it by following instruction on their official documentation. After installing, create new file called load_test.js and copy paste this code below. Do change the url to your service url from minikube.

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  // A number specifying the number of VUs to run concurrently.
  vus: 100,
  // A string specifying the total duration of the test run.
  duration: '120s',
};

export default function() {
  http.get('http://127.0.0.1:64544'); // change to your service url from minikube
  sleep(1);
}

Basically we will do load test to our service for 120s. To run the script use k6 run <test_file> command like this.

➜ k6 run load_test.js

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: load_test.js
     output: -

  scenarios: (100.00%) 1 scenario, 100 max VUs, 2m30s max duration (incl. graceful stop):
           * default: 100 looping VUs for 2m0s (gracefulStop: 30s)


running (0m23.3s), 100/100 VUs, 2207 complete and 0 interrupted iterations
default   [======>-------------------------------] 100 VUs  0m23.3s/2m0s

New Pods Created

After few seconds lets check our HPA and we should see that the REPLICAS is increasing. In mine I see 6 replicas that means I have 3 new pods running. In your machine this could be different. This means that the scale up is work as expected.

➜ kubectl get horizontalpodautoscalers.autoscaling
NAME            REFERENCE              TARGETS                          
simple-go-hpa   Deployment/simple-go   cpu: 93%/50%  

MINPODS   MAXPODS   REPLICAS   AGE
3         10        6          11m

If we check the pods list we will also see the new pods that are running, you can differentiate by the ages.

➜ kubectl get pods                                
simple-go-764bc77644-2dtbj   1/1     Running   0          33s
simple-go-764bc77644-8pfcs   1/1     Running   0          17m31s
simple-go-764bc77644-g9zph   1/1     Running   0          33s
simple-go-764bc77644-grfhg   1/1     Running   0          17m29s
simple-go-764bc77644-jf897   1/1     Running   0          33s
simple-go-764bc77644-snb8n   1/1     Running   0          17m26s

Few minutes after the load test done we can check again the pods and we should see that we only have 3 pods now, equal to specified minReplicas value.

➜ kubectl get pods  
NAME                         READY   STATUS    RESTARTS   AGE
simple-go-764bc77644-8pfcs   1/1     Running   0          25m
simple-go-764bc77644-grfhg   1/1     Running   0          25m
simple-go-764bc77644-snb8n   1/1     Running   0          25m

This means that the service traffic is stabilized and the controller manager already remove unneeded replicas until it reached minimum number of replicas we specified. The scale down of our HPA is work as expected.

Update our App​

Define HPA​

spec.scaleTargetRef​

spec.minReplicas​

spec.maxReplicas​

spec.metrics​

spec.behavior​

Apply and Validate​

Load Testing​

New Pods Created​

References​