+++ title = "exploring and configuring scaling" chapter = false weight = 1 +++ :imagesdir: /images == Scaling an OpenShift 4 Cluster With OpenShift 4.0+, we now have the ability to dynamically scale the cluster size through OpenShift itself. In this lab you will learn how to scale OpenShift 4 cluster both manually and automatically. Applying autoscaling to an OpenShift Container Platform cluster involves deploying a ClusterAutoscaler and then deploying MachineAutoscalers for each Machine type in your cluster. *Machine sets* MachineSet resources are groups of machines. Machine sets are to machines as replica sets are to pods. If you need more machines or must scale them down, you change the replicas field on the machine set to meet your compute need. The following custom resources add more capabilities to your cluster: *Machine autoscaler* The MachineAutoscaler resource automatically scales machines in a cloud. You can set the minimum and maximum scaling boundaries for nodes in a specified machine set, and the machine autoscaler maintains that range of nodes. The MachineAutoscaler object takes effect after a ClusterAutoscaler object exists. Both ClusterAutoscaler and MachineAutoscaler resources are made available by the ClusterAutoscalerOperator object. *Cluster autoscaler* This resource is based on the upstream cluster autoscaler project. In the OpenShift Container Platform implementation, it is integrated with the Machine API by extending the machine set API. You can set cluster-wide scaling limits for resources such as cores, nodes, memory, GPU, and so on. You can set the priority so that the cluster prioritizes pods so that new nodes are not brought online for less important pods. You can also set the scaling policy so that you can scale up nodes but not scale them down. *Machine health check* The MachineHealthCheck resource detects when a machine is unhealthy, deletes it, and, on supported platforms, makes a new machine. === Manual Cluster Scale Up/Down (Option 1) Step 1:: Viewing components: ---- Login to the OpenShift web console Select the admin console Expand Compute Select the *openshift-machine-api* project Select nodes Select machine sets Select Machine autoscalers ---- You should see the list any existing scale compnents available for the platform: image::ocp4-machinesets.png[image] Step 2:: Create a Machine Set Machine sets are single AZ constructs, for multi AZ deployments a machine set can be created in each AWS AZ. Alternatively machine pools can be used. ---- Select Machine sets Click on create machine set Replace the YAML with the following apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: creationTimestamp: '2021-07-08T16:51:09Z' generation: 2 name: worker namespace: openshift-machine-api resourceVersion: '550802' selfLink: >- /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/worker uid: b1b84b30-e00c-11eb-97d5-0248f54f61c7 spec: replicas: 1 selector: matchLabels: foo: bar template: metadata: labels: foo: bar spec: providerSpec: value: ami: id: ami-046fe691f52a953f9 apiVersion: awsproviderconfig.openshift.io/v1beta1 blockDevices: - ebs: iops: 0 volumeSize: 120 volumeType: gp2 deviceIndex: 0 instanceType: m4.large kind: AWSMachineProviderConfig placement: availabilityZone: us-wast-2a region: us-east-1 Click Save ---- This will create a machine set for a worker node in us-west-2 Step 3:: Create a machine autoscaler ---- Select machine auto scaler Create machine autoscaler Change the name to us-west-2-worker Change the scale target ref name to worker click save ---- This creates a definition of minimum and maximun scaling of the machin set created in Step 2. Step 4:: manual scaling ---- Select machine sets Select a machine set Click on actions Select edit count Increase the count Click save ---- image::ocp4-ms-count.png[image] ---- Click the Machines tab and see the allocated machines. Click Overview tab, it will let you know when when the machines become ready. Click Machine Sets under Machines on the left-hand side again, you will also see the status of the machines in the set: ---- image::ocp4-ms-count3.png[image] *You may need to wait a few minutes before the machines become ready* In the background additional EC2 instances are being provisioned and then registered and configured to participate in the cluster, so yours may still show 1/3. You can also view this in the CLI: === Managing nodes via the CLI Step 1:: list machine sets ---- oc get machinesets -n openshift-machine-api ---- This should return something like: ---- NAME DESIRED CURRENT READY AVAILABLE AGE cluster-3e5f-kqr2b-worker-us-east-2a 3 3 3 3 24h cluster-3e5f-kqr2b-worker-us-east-2b 1 1 1 1 24h cluster-3e5f-kqr2b-worker-us-east-2c 1 1 1 1 24h ---- Step 2:: list nodes ---- oc get nodes ---- Step 3:: scaling via CLI You can alter the Machine Set count in several ways in the web UI console, but you can also perform the same operation via the CLI by using the oc edit command on the machineset in the openshift-machine-api project. ---- oc get machineset -n openshift-machine-api ---- ---- NAME DESIRED CURRENT READY AVAILABLE AGE cluster-4c7b-lkw4d-worker-us-east-2a 1 1 1 1 6h17m cluster-4c7b-lkw4d-worker-us-east-2b 1 1 1 1 6h17m cluster-4c7b-lkw4d-worker-us-east-2c 1 1 1 1 6h17m ---- you can use oc edit machineset to open the YAML in vi and update it, you will need to collect the machine set name from oc get above. ---- oc edit machineset cluster-4c7b-lkw4d-worker-us-east-2a -n openshift-machine-api ---- - You will see the following text: ``` # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: autoscaling.openshift.io/machineautoscaler: openshift-machine-api/autoscale-us-east-2a-ts7rr machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "4" machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "1" creationTimestamp: "2019-05-13T20:34:26Z" generation: 9 labels: machine.openshift.io/cluster-api-cluster: cluster-3e5f-kqr2b name: cluster-3e5f-kqr2b-worker-us-east-2a namespace: openshift-machine-api resourceVersion: "446823" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/cluster-3e5f-kqr2b-worker-us-east-2a uid: 80644a16-75be-11e9-bb7c-02f7ee4a116e spec: replicas: 1 selector: matchLabels: machine.openshift.io/cluster-api-cluster: cluster-3e5f-kqr2b machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: cluster-3e5f-kqr2b-worker-us-east-2a template: (...) ``` - The above is just an example of the entire machine set configuration. Update `replicas` from 1 to 2 - Save your changes simply save and quit from your editor. OpenShift will now patch the configuration. You should see that your modified machine set (depending on which one you edited) will be confirmed: - Once that has been changed, you can view the outcome here: ---- oc get machinesets -n openshift-machine-api ---- ``` NAME DESIRED CURRENT READY AVAILABLE AGE cluster-3e5f-kqr2b-worker-us-east-2a 3 3 3 3 24h cluster-3e5f-kqr2b-worker-us-east-2b 1 1 1 1 24h cluster-3e5f-kqr2b-worker-us-east-2c 1 1 1 1 24h ``` Again, before you move forward, return the `replica` count back to 1, using the same method as above. === Automatic Cluster Scale Up via CLI You can edit exiting machine autoscalers in a similar way to how you modified machine sets above. ==== Define a MachineAutoScaler - Configure a MachineAutoScaler - you'll need to fetch the following YAML file: ``` [~] $ wget https://raw.githubusercontent.com/RedHatWorkshops/openshiftv4-workshop/master/solutions/machine-autoscale-example.yaml ``` The file has the following contents: ``` kind: List metadata: {} apiVersion: v1 items: - apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: generateName: autoscale-- namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: -worker- - apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: generateName: autoscale-- namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: -worker- - apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: generateName: autoscale-- namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: -worker- ``` - Check the MachineSets with the CLI, you noticed that they all had the format of: ``` -worker- ``` - MachineAutoscaler resources must be defined for each region-AZ that you want to autoscale. - Using the example output and MachineSets above, and selecting "us-east-2a" as the region we're going to autoscale into, you would need to modify the YAML file to look like the following: - To ensure you make no mistakes, here is the command you can use to update the yaml ``` $ export CLUSTER_NAME=$(oc get machinesets -n openshift-machine-api | awk -F'-worker-' 'NR>1{print $1;exit;}') $ export REGION_NAME=us-east-2a $ sed -i s/\/$REGION_NAME/g machine-autoscale-example.yaml $ sed -i s/\/$CLUSTER_NAME/g machine-autoscale-example.yaml ``` - Here is the working sample of an MachineAutoScaler: ``` [~] $ cat machine-autoscale-example.yaml kind: List metadata: {} apiVersion: v1 items: - apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: generateName: autoscale-us-east-2a- namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: cluster-4c7b-lkw4d-worker-us-east-2a - apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: generateName: autoscale-us-east-2a- namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: cluster-4c7b-lkw4d-worker-us-east-2a - apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: generateName: autoscale-us-east-2a- namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: cluster-4c7b-lkw4d-worker-us-east-2a ``` NOTE: If you aren't deployed into this region, or don't want to use us-east-2a, adapt the instructions to suit. - **Make sure** that you properly modify both generateName and name. * Note which one has the and which one does not. * Note that generateName has a trailing hyphen. * You can specify the minimum and maximum quantity of nodes that are allowed to be created by adjusting the minReplicas and maxReplicas. - You do not have to define a MachineAutoScaler for each MachineSet. But remember that each MachineSet corresponds to an AWS region/AZ. So, without having multiple MachineAutoScalers, you could end up with a cluster fully scaled out in a single AZ. If that's what you're after, it's fine. However if AWS has a problem in that AZ, you run the risk of losing a large portion of your cluster. NOTE: You should probably choose a small-ish number for maxReplicas. The next lab will autoscale the cluster up to that maximum. You're paying for the EC2 instances. - Once the file has been modified appropriately, you can now create the autoscaler: ``` $ oc create -f machine-autoscale-example.yaml -n openshift-machine-api ``` ==== Define a ClusterAutoscaler - Define a ClusterAutoscaler, this configures some boundaries and behaviors for how the cluster will autoscale. An example definition file can be found at: ``` https://raw.githubusercontent.com/RedHatWorkshops/openshiftv4-workshop/master/solutions/cluster-autoscaler.yaml ``` - This definition is set for a maximum of 20 workers, but we need to reduce that with our labs to minimize the cost. Let's first download that file: ``` [~] $ wget https://raw.githubusercontent.com/RedHatWorkshops/openshiftv4-workshop/master/solutions/cluster-autoscaler.yaml ``` - Modify the max number of replicas: ``` $ sed -i s/20/10/g cluster-autoscaler.yaml ``` - Here is an example of ClusterAutoscaler yaml. ``` [~] $ cat machine-autoscale-example.yaml apiVersion: "autoscaling.openshift.io/v1" kind: "ClusterAutoscaler" metadata: name: "default" spec: resourceLimits: maxNodesTotal: 10 scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s ``` - Create the ClusterAutoscaler with the following command: ``` $ oc create -f cluster-autoscaler.yaml clusterautoscaler.autoscaling.openshift.io/default created ``` NOTE: The ClusterAutoscaler is not a namespaced resource -- it exists at the cluster scope. ==== Define a Job The following example YAML file defines a Job: https://raw.githubusercontent.com/openshift/training/master/assets/job-work-queue.yaml It will produce a massive load that the cluster cannot handle, and will force the autoscaler to take action (up to the maxReplicas defined in your ClusterAutoscaler YAML). NOTE: If you did not scale down your machines earlier, you may have too much capacity to trigger an autoscaling event. Make sure you have no more than 3 total workers before continuing. - Create a project to hold the resources for the Job, and switch into it: ``` $ oc adm new-project autoscale-example && oc project autoscale-example Created project autoscale-example Now using project "autoscale-example" on server "{{API_URL}}". ``` ==== Open Grafana - Go to OpenShift web console - Click `Monitoring` --> `Dashboards` - This will open a new browser tab for Grafana. You will also get a certificate error similar to the first time you logged in. - Click `Advance` when you see the SSL certificate error. - Click `Process to ...` link - Click `Log in with OpenShift` - Click `htpasswd` - Enter your provided admin username and password and click login - CLick `Allow selected permissions` - you will see the Grafana homepage. Grafana is configured to use an OpenShift user and inherits permissions of that user for accessing cluster information. - Click the dropdown on `Home` and choose `Kubernetes / Compute Resources / Cluster`. Leave this browser window open while you start the Job so that you can observe the CPU utilization of the cluster rise: image::ocp4-grafana.png[image] ==== Force an Autoscaling Event - Go to web terminal, create the Job: ``` $ oc create -n autoscale-example -f https://raw.githubusercontent.com/openshift/training/master/assets/job-work-queue.yaml job.batch/work-queue-qncs2 created ``` - Check status of the Job. It will create a lot of Pods: ``` $ oc get pod -n autoscale-example NAME READY STATUS RESTARTS AGE work-queue-qncs2-26x9c 0/1 Pending 0 33s work-queue-qncs2-28h6r 0/1 Pending 0 33s work-queue-qncs2-2tdz9 0/1 Pending 0 33s work-queue-qncs2-526hl 0/1 Pending 0 33s work-queue-qncs2-55nr7 0/1 Pending 0 33s work-queue-qncs2-5d98k 0/1 Pending 0 33s work-queue-qncs2-7pd5p 0/1 Pending 0 31s work-queue-qncs2-8k76z 0/1 Pending 0 32s (...) ``` - After a few moments, look at the list of Machines: ``` $ oc get machines -n openshift-machine-api NAME INSTANCE STATE TYPE REGION ZONE AGE beta-190305-1-79tf5-master-0 i-080dea906d9750737 running m4.xlarge us-east-2 us-east-2a 26h beta-190305-1-79tf5-master-1 i-0bf5ad242be0e2ea1 running m4.xlarge us-east-2 us-east-2b 26h beta-190305-1-79tf5-master-2 i-00f13148743c13144 running m4.xlarge us-east-2 us-east-2c 26h beta-190305-1-79tf5-worker-us-east-2a-8dvwq i-06ea8662cf76c7591 running m4.large us-east-2 us-east-2a 2m7s <-------- beta-190305-1-79tf5-worker-us-east-2a-9pzvg i-0bf01b89256e7f39f running m4.large us-east-2 us-east-2a 2m7s <-------- beta-190305-1-79tf5-worker-us-east-2a-vvddp i-0e649089d42751521 running m4.large us-east-2 us-east-2a 2m7s <-------- beta-190305-1-79tf5-worker-us-east-2a-xx282 i-07b2111dff3c7bbdb running m4.large us-east-2 us-east-2a 26h beta-190305-1-79tf5-worker-us-east-2b-hjv9c i-0562517168aadffe7 running m4.large us-east-2 us-east-2b 26h beta-190305-1-79tf5-worker-us-east-2c-cdhth i-09fbcd1c536f2a218 running m4.large us-east-2 us-east-2c 26h ``` - You should see a scaled-up cluster with three new additions as worker nodes in us-east-2a, you can see the ones that have been auto-scaled from their age. - Depending on when you run the command, your list may show all running workers, or some pending. After the Job completes, which could take anywhere from a few minutes to ten or more (depending on your ClusterAutoscaler size and your MachineAutoScaler sizes), the cluster should scale down to the original count of worker nodes. You can watch the output with the following (runs every 10s)- ``` $ watch -n10 'oc get machines -n openshift-machine-api' ``` - In Grafana, be sure to click the autoscale-example project in the graphs image::granfana-autoscale.png[image]