= Kubernetes Stateful Containers using StatefulSets and Persistent Volumes :toc: :icons: :linkcss: :imagesdir: ../../resources/images In this section, we will review how to launch and manage applications using https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/[StatefulSets] and https://kubernetes.io/docs/concepts/storage/persistent-volumes/[Persistent Volumes]. We will review how to deploy MySQL database using StatefulSets and EBS volumes. The example is a MySQL single-master topology with multiple slaves running asynchronous replication. The example consists of ConfigMap, two MySQL services and a StatefulSet. We will deploy MySQL database, send some traffic to test connection status, go through few failure modes and review resiliency that is built into the StatefulSet. Lastly, we'll demonstrate how to use scale options with StatefulSet. == Prerequisites In order to perform exercises in this chapter, you’ll need to deploy configurations to a Kubernetes cluster. To create an EKS-based Kubernetes cluster, use the link:../../01-path-basics/102-your-first-cluster#create-a-kubernetes-cluster-with-eks[AWS CLI] (recommended). If you wish to create a Kubernetes cluster without EKS, you can instead use link:../../01-path-basics/102-your-first-cluster#alternative-create-a-kubernetes-cluster-with-kops[kops]. All configuration files for this chapter are in the `statefulsets` directory. Make sure you change to that directory before giving any commands in this chapter. == Create ConfigMap Using ConfigMap, you can independently control MySQL configuration. The ConfigMap looks like as shown: ``` apiVersion: v1 kind: ConfigMap metadata: name: mysql-config labels: app: mysql data: master.cnf: | # Apply this config only on the master. [mysqld] log-bin slave.cnf: | # Apply this config only on slaves. [mysqld] super-read-only ``` In this case, we are using master to serve replication logs to slave and slaves are read-only. Create the ConfigMap using the command shown: $ kubectl create -f templates/mysql-configmap.yaml configmap "mysql-config" created == Create Services Create two headless services using the following configuration: ``` # Headless service for stable DNS entries of StatefulSet members. apiVersion: v1 kind: Service metadata: name: mysql labels: app: mysql spec: ports: - name: mysql port: 3306 clusterIP: None selector: app: mysql --- # Client service for connecting to any MySQL instance for reads. # For writes, you must instead connect to the master: mysql-0.mysql. apiVersion: v1 kind: Service metadata: name: mysql-read labels: app: mysql spec: ports: - name: mysql port: 3306 selector: app: mysql ``` The `mysql` service is used for DNS resolution so that when pods are placed by StatefulSet controller, pods can be resolved using `<pod-name>.mysql`. `mysql-read` is a client service that does load balancing for all slaves. $ kubectl create -f templates/mysql-services.yaml service "mysql" created service "mysql-read" created Only read queries can use the load-balanced `mysql-read` service. Because there is only one MySQL master, clients should connect directly to the MySQL master Pod, identified by `mysql-0.mysql`, to execute writes. == Create StatefulSet Finally, we create StatefulSet using the configuration in `templates/mysql-statefulset.yaml` using the command shown: $ kubectl create -f templates/mysql-statefulset.yaml statefulset "mysql" created $ kubectl get -w statefulset NAME DESIRED CURRENT AGE mysql 3 1 8s mysql 3 2 59s mysql 3 3 2m mysql 3 3 3m In a different terminal window, wou can watch the progress of pods creation using the following command: $ kubectl get pods -l app=mysql --watch NAME READY STATUS RESTARTS AGE mysql-0 0/2 Init:0/2 0 30s mysql-0 0/2 Init:1/2 0 35s mysql-0 0/2 PodInitializing 0 47s mysql-0 1/2 Running 0 48s mysql-0 2/2 Running 0 59s mysql-1 0/2 Pending 0 0s mysql-1 0/2 Pending 0 0s mysql-1 0/2 Pending 0 0s mysql-1 0/2 Init:0/2 0 0s mysql-1 0/2 Init:1/2 0 35s mysql-1 0/2 Init:1/2 0 45s mysql-1 0/2 PodInitializing 0 54s mysql-1 1/2 Running 0 55s mysql-1 2/2 Running 0 1m mysql-2 0/2 Pending 0 <invalid> mysql-2 0/2 Pending 0 <invalid> mysql-2 0/2 Pending 0 0s mysql-2 0/2 Init:0/2 0 0s mysql-2 0/2 Init:1/2 0 32s mysql-2 0/2 Init:1/2 0 43s mysql-2 0/2 PodInitializing 0 50s mysql-2 1/2 Running 0 52s mysql-2 2/2 Running 0 56s Press `Ctrl`+`C` to stop watching. If you notice, the pods are initialized in an orderly fashion in their startup process. The reason being StatefulSet controller assigns a unique, stable name (`mysql-0`, `mysql-1`, `mysql-2`) with `mysql-0` being the master and others being slaves. The configuration uses https://www.percona.com/software/mysql-database/percona-xtrabackup[Percona Xtrabackup] (open-source tool) to clone source MySQL server to its slaves. == Test MySQL setup You can use `mysql-client` to send some data to the master (`mysql-0.mysql`) ``` kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never --\ mysql -h mysql-0.mysql <<EOF CREATE DATABASE test; CREATE TABLE test.messages (message VARCHAR(250)); INSERT INTO test.messages VALUES ('hello, from mysql-client'); EOF ``` You can run the following to test if slaves (`mysql-read`) received the data ``` $ kubectl run mysql-client --image=mysql:5.7 -it --rm --restart=Never --\ mysql -h mysql-read -e "SELECT * FROM test.messages" ``` This should display an output like this: ``` +--------------------------+ | message | +--------------------------+ | hello, from mysql-client | +--------------------------+ ``` To test load balancing across slaves, you can run the following command: kubectl run mysql-client-loop --image=mysql:5.7 -i -t --rm --restart=Never --\ bash -ic "while sleep 1; do mysql -h mysql-read -e 'SELECT @@server_id,NOW()'; done" Each MySQL instance is assigned a unique identifier, and it can be retrieved using `@@server_id`. This command prints the server id serving the request and the timestamp in an infinite loop. This command will show the output: +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 100 | 2017-10-24 03:01:11 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 100 | 2017-10-24 03:01:12 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 102 | 2017-10-24 03:01:13 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 101 | 2017-10-24 03:01:14 | +-------------+---------------------+ You can leave this open in a separate window while you run failure modes in the next section. Alternatively, you can use `Ctrl`+`C` to terminate the loop. == Testing failure modes We will see how StatefulSet behave in different failure modes. The following modes will be tested: . Unhealthy container . Failed pod . Failed node === Unhealthy container MySQL container uses readiness probe by running `mysql -h 127.0.0.1 -e 'SELECT 1'` on the server to make sure MySQL server is still active. Run this command to simulate MySQL as being unresponsive: kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql /usr/bin/mysql.off This command renames the `/usr/bin/mysql` command so that readiness probe cannot find it. After a few seconds, during the next health check, the Pod should report one of its containers is not healthy. This can be verified using the command: kubectl get pod mysql-2 NAME READY STATUS RESTARTS AGE mysql-2 1/2 Running 0 12m `mysql-read` load balancer detects failures like this and takes action by not sending traffic to failed containers. You can check this if you have the loop running in separate window. The loop shows the following output: ``` +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 101 | 2017-10-24 03:17:09 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 101 | 2017-10-24 03:17:10 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 100 | 2017-10-24 03:17:11 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 100 | 2017-10-24 03:17:12 | +-------------+---------------------+ ``` Revert back to its initial state kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql.off /usr/bin/mysql Check the status again to see that both the pods are running and healthy: $ kubectl get pod -w mysql-2 NAME READY STATUS RESTARTS AGE mysql-2 2/2 Running 0 5h And the loop is now also showing all three servers. === Failed pod To simulate a failed pod, you can delete a pod as shown: kubectl delete pod mysql-2 pod "mysql-2" deleted StatefulSet controller recognizes failed pods and creates a new one with same name and link to the same PersistentVolumeClaim. $ kubectl get pod -w mysql-2 NAME READY STATUS RESTARTS AGE mysql-2 0/2 Init:0/2 0 28s mysql-2 0/2 Init:1/2 0 31s mysql-2 0/2 PodInitializing 0 32s mysql-2 1/2 Running 0 33s mysql-2 2/2 Running 0 37s === Failed node Kubernetes allows a node to be marked unschedulable using the `kubectl drain` command. This prevents any new pods to be scheduled on this node. If the API server supports eviction, then it will evict the pods. Otherwise, it will delete all the pods. The evict and delete happens for all the pods except mirror pods (which cannot be deleted through API server). Read more about drain at https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/. You can simulate node downtime by draining the node. In order to determine which node to drain, run this command $ kubectl get pod mysql-2 -o wide NAME READY STATUS RESTARTS AGE IP NODE mysql-2 2/2 Running 0 11m 100.96.6.12 ip-172-20-64-152.ec2.internal Drain the node using the command: $ kubectl drain ip-172-20-64-152.ec2.internal --force --delete-local-data --ignore-daemonsets node "ip-172-20-64-152.ec2.internal" cordoned WARNING: Deleting pods with local storage: mysql-2; Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: kube-proxy-ip-172-20-64-152.ec2.internal pod "kube-dns-479524115-76s6j" evicted pod "mysql-2" evicted node "ip-172-20-64-152.ec2.internal" drained You can look at the list of nodes: $ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-20-107-81.ec2.internal Ready node 10h v1.7.4 ip-172-20-122-243.ec2.internal Ready master 10h v1.7.4 ip-172-20-125-181.ec2.internal Ready node 10h v1.7.4 ip-172-20-37-239.ec2.internal Ready master 10h v1.7.4 ip-172-20-52-200.ec2.internal Ready node 10h v1.7.4 ip-172-20-57-5.ec2.internal Ready node 10h v1.7.4 ip-172-20-64-152.ec2.internal Ready,SchedulingDisabled node 10h v1.7.4 ip-172-20-76-117.ec2.internal Ready master 10h v1.7.4 Notice how scheduling is disabled on one node. Now you can watch Pod reschedules kubectl get pod mysql-2 -o wide --watch The output always stay at: NAME READY STATUS RESTARTS AGE IP NODE mysql-2 0/2 Pending 0 33s <none> <none> This could be a bug in StatefulSet as the pod was failing to reschedule. The reason was, there was no other nodes running in the AZ where the original node failed. The EBS volume was failing to to attach to other nodes because of different AZ restriction. To mitigate this issue, manually scale the nodes to 6 which resulted in an additional node being available in that AZ. Your scenario could be different and may not need this step. Edit number of nodes to `6` if you run into `Pending` issue: kops edit ig nodes Change the specification to: spec: image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28 machineType: t2.medium maxSize: 6 minSize: 6 role: Node subnets: - us-east-1a - us-east-1b - us-east-1c Review and commit changes: kops update cluster --yes It takes a few minutes for a new node to be provisioned. This can be verified using the command shown: $ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-20-107-81.ec2.internal Ready node 10h v1.7.4 ip-172-20-122-243.ec2.internal Ready master 10h v1.7.4 ip-172-20-125-181.ec2.internal Ready node 10h v1.7.4 ip-172-20-37-239.ec2.internal Ready master 10h v1.7.4 ip-172-20-52-200.ec2.internal Ready node 10h v1.7.4 ip-172-20-57-5.ec2.internal Ready node 10h v1.7.4 ip-172-20-64-152.ec2.internal Ready,SchedulingDisabled node 10h v1.7.4 ip-172-20-73-181.ec2.internal Ready node 1m v1.7.4 ip-172-20-76-117.ec2.internal Ready master 10h v1.7.4 Now you can watch the status of the pod: $ kubectl get pod mysql-2 -o wide NAME READY STATUS RESTARTS AGE IP NODE mysql-2 2/2 Running 0 11m 100.96.8.2 ip-172-20-73-181.ec2.internal Let's put the previous node back into normal state: $ kubectl uncordon ip-172-20-64-152.ec2.internal node "ip-10-10-71-96.ec2.internal" uncordoned The list of nodes is now shown as: $ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-20-107-81.ec2.internal Ready node 10h v1.7.4 ip-172-20-122-243.ec2.internal Ready master 10h v1.7.4 ip-172-20-125-181.ec2.internal Ready node 10h v1.7.4 ip-172-20-37-239.ec2.internal Ready master 10h v1.7.4 ip-172-20-52-200.ec2.internal Ready node 10h v1.7.4 ip-172-20-57-5.ec2.internal Ready node 10h v1.7.4 ip-172-20-64-152.ec2.internal Ready node 10h v1.7.4 ip-172-20-73-181.ec2.internal Ready node 3m v1.7.4 ip-172-20-76-117.ec2.internal Ready master 10h v1.7.4 == Scaling slaves More slaves can be added to the MySQL cluster to increase the read query capacity. This can be done using the command shown: $ kubectl scale statefulset mysql --replicas=5 statefulset "mysql" scaled Of course, you can watch the progress of scaling kubectl get pods -l app=mysql -w It shows the output: $ kubectl get pods -l app=mysql -w NAME READY STATUS RESTARTS AGE mysql-0 2/2 Running 0 6h mysql-1 2/2 Running 0 6h mysql-2 2/2 Running 0 16m mysql-3 0/2 Init:0/2 0 1s mysql-3 0/2 Init:1/2 0 18s mysql-3 0/2 Init:1/2 0 28s mysql-3 0/2 PodInitializing 0 36s mysql-3 1/2 Running 0 37s mysql-3 2/2 Running 0 43s mysql-4 0/2 Pending 0 <invalid> mysql-4 0/2 Pending 0 <invalid> mysql-4 0/2 Pending 0 0s mysql-4 0/2 Init:0/2 0 0s mysql-4 0/2 Init:1/2 0 31s mysql-4 0/2 Init:1/2 0 41s mysql-4 0/2 PodInitializing 0 52s mysql-4 1/2 Running 0 53s mysql-4 2/2 Running 0 58s If the loop is still running, then it will print an output as shown: +-------------+---------------------+ | 101 | 2017-10-24 03:53:53 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 100 | 2017-10-24 03:53:54 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 102 | 2017-10-24 03:53:55 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 103 | 2017-10-24 03:53:57 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 103 | 2017-10-24 03:53:58 | +-------------+---------------------+ +-------------+---------------------+ | @@server_id | NOW() | +-------------+---------------------+ | 104 | 2017-10-24 03:53:59 | +-------------+---------------------+ You can also verify if the slaves have the same data set: kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --\ mysql -h mysql-3.mysql -e "SELECT * FROM test.messages" It still shows the same result: +--------------------------+ | message | +--------------------------+ | hello, from mysql-client | +--------------------------+ You can scale down by using the command shown: kubectl scale statefulset mysql --replicas=3 statefulset "mysql" scaled Note that, scale in doesn't delete the data or PVCs attached to the pods. You have to delete them manually kubectl delete pvc data-mysql-3 kubectl delete pvc data-mysql-4 It shows the output: persistentvolumeclaim "data-mysql-3" deleted persistentvolumeclaim "data-mysql-4" deleted == Cleaning up First delete the StatefulSet. This also terminates the pods: $ kubectl delete statefulset mysql statefulset "mysql" deleted Verify there are no more pods running: kubectl get pods -l app=mysql It shows the output: No resources found. Delete ConfigMap, Service, PVC using the command: $ kubectl delete configmap,service,pvc -l app=mysql configmap "mysql-config" deleted service "mysql" deleted service "mysql-read" deleted persistentvolumeclaim "data-mysql-0" deleted persistentvolumeclaim "data-mysql-1" deleted persistentvolumeclaim "data-mysql-2" deleted You are now ready to continue on with the workshop! :frame: none :grid: none :valign: top [align="center", cols="1", grid="none", frame="none"] |===== |image:button-continue-developer.png[link=../../03-path-application-development/308-cicd-workflows/] |link:../../developer-path.adoc[Go to Developer Index] |=====