Scaling: Horizontal Pod Autoscaler
In Kubernetes, a HorizontalPodAutoscaler
automatically updates a workload resource (such as a Deployment
or StatefulSet
), with the aim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy more Pods. This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.
If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment
, StatefulSet
, or other similar resource) to scale back down.
Update our App
Lets update our app first to return hashed message using SHA256
to make some additional CPU load. Add new field called Hash
in our response definition.
type Response struct {
Message string `json:"message,omitempty"`
Hash string `json:"hash,omitempty"`
}
Add the hashed message into the Hash
response field and then rebuild the apps.
hasher := sha256.New()
hasher.Write([]byte(message))
hash := fmt.Sprintf("%x", hasher.Sum(nil))
res := Response{
Message: message,
Hash: hash,
}
Lets deploy the newly builded app into our kubernetes cluster. We can do this using rollout restart
command. Kubernetes will spin up new pods and removing the old pods as the new one ready to receive traffic. This ensure there is no down time to the service when we deploying new version of the app.
➜ kubectl rollout restart deployment/simple-go
deployment.apps/simple-go restarted
We can check using get pod
command and we should see new sets of pods running.
Define HPA
Next lets define the horizontal pod autoscaler, we can create new file hpa.yaml
and put below definition there.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: simple-go-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: simple-go
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 1
periodSeconds: 15
spec.scaleTargetRef
In this section we define which object that the autoscaler should target. We choose a deployment object called simple-go
as the target. The controller manager then selects the pods based on the target resource's .spec.selector
labels (which in our case it's app: simple-go
), and obtains the metrics from either the resource metrics API.
spec.minReplicas
Define minimum number of Pods.
spec.maxReplicas
Define maximum number of Pods.
spec.metrics
In this section we define what metrics should the controller manager look. We chose cpu
as the resource with target type: Utilization
and averageUtilization: 50
. It means that if the pods average CPU utilization above 50%
, the controller manager will spin up new pods.
The controller manager will keep spin up new pods until the average utilization below specified value or maxReplicas
reached.
If the average utilization below the specified value for certain amount of time (defaulted to 300s
). The controller manager will deleting / removing pods until the minReplicas
reached.
spec.behavior
We can also define how the scaling behavior. The default behavior can be read here: HPA: Default Behavior.
We define here that the scaleDown
behavior stabilizationWindowSeconds: 120
, that means the controller manager need to wait for 120 seconds or 2 minutes for the metrics to stabilize before deleting some pods.
policies:
- type: Pods
value: 1
periodSeconds: 15
Policies here means that the controller manager can remove 1
pods for every 15s
if the metrics is stabilized.
Apply and Validate
Lets apply our HPA using apply
command and validate using get horizontalpodautoscalers.autoscaling
command.
➜ kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/simple-go-hpa create
➜ kubectl get horizontalpodautoscalers.autoscaling
NAME REFERENCE TARGETS
simple-go-hpa Deployment/simple-go cpu: 10%/50%
MINPODS MAXPODS REPLICAS AGE
3 10 3 26s
Load Testing
To test wether our HPA work as expected we can create a load test. This load test will simulate increased traffic into our servce.
We use k6
to do load testing, you can download and install it by following instruction on their official documentation. After installing, create new file called load_test.js
and copy paste this code below. Do change the url to your service url from minikube.
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
// A number specifying the number of VUs to run concurrently.
vus: 100,
// A string specifying the total duration of the test run.
duration: '120s',
};
export default function() {
http.get('http://127.0.0.1:64544'); // change to your service url from minikube
sleep(1);
}
Basically we will do load test to our service for 120s
. To run the script use k6 run <test_file>
command like this.
➜ k6 run load_test.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: load_test.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 2m30s max duration (incl. graceful stop):
* default: 100 looping VUs for 2m0s (gracefulStop: 30s)
running (0m23.3s), 100/100 VUs, 2207 complete and 0 interrupted iterations
default [======>-------------------------------] 100 VUs 0m23.3s/2m0s
New Pods Created
After few seconds lets check our HPA and we should see that the REPLICAS
is increasing. In mine I see 6
replicas that means I have 3
new pods running. In your machine this could be different. This means that the scale up is work as expected.
➜ kubectl get horizontalpodautoscalers.autoscaling
NAME REFERENCE TARGETS
simple-go-hpa Deployment/simple-go cpu: 93%/50%
MINPODS MAXPODS REPLICAS AGE
3 10 6 11m
If we check the pods list we will also see the new pods that are running, you can differentiate by the ages.
➜ kubectl get pods
simple-go-764bc77644-2dtbj 1/1 Running 0 33s
simple-go-764bc77644-8pfcs 1/1 Running 0 17m31s
simple-go-764bc77644-g9zph 1/1 Running 0 33s
simple-go-764bc77644-grfhg 1/1 Running 0 17m29s
simple-go-764bc77644-jf897 1/1 Running 0 33s
simple-go-764bc77644-snb8n 1/1 Running 0 17m26s
Few minutes after the load test done we can check again the pods and we should see that we only have 3
pods now, equal to specified minReplicas
value.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
simple-go-764bc77644-8pfcs 1/1 Running 0 25m
simple-go-764bc77644-grfhg 1/1 Running 0 25m
simple-go-764bc77644-snb8n 1/1 Running 0 25m
This means that the service traffic is stabilized and the controller manager already remove unneeded replicas until it reached minimum number of replicas we specified. The scale down of our HPA is work as expected.