= Kubernetes Stateful Containers using StatefulSets and Persistent Volumes
:toc:
:icons:
:linkcss:
:imagesdir: ../../resources/images

In this section, we will review how to launch and manage applications using https://kubernetes.io/docs/concepts/abstractions/controllers/statefulsets/[StatefulSets] and https://kubernetes.io/docs/concepts/storage/persistent-volumes/[Persistent Volumes].

We will review how to deploy MySQL database using StatefulSets and EBS volumes. The example is a MySQL single-master topology with multiple slaves running asynchronous replication.

The example consists of ConfigMap, two MySQL services and a StatefulSet. We will deploy MySQL database,
send some traffic to test connection status, go through few failure modes and review resiliency that
is built into the StatefulSet. Lastly, we'll demonstrate how to use scale options with StatefulSet.

== Prerequisites

In order to perform exercises in this chapter, you’ll need to deploy configurations to a Kubernetes cluster. To create an EKS-based Kubernetes cluster, use the link:../../01-path-basics/102-your-first-cluster#create-a-kubernetes-cluster-with-eks[AWS CLI] (recommended). If you wish to create a Kubernetes cluster without EKS, you can instead use link:../../01-path-basics/102-your-first-cluster#alternative-create-a-kubernetes-cluster-with-kops[kops].

All configuration files for this chapter are in the `statefulsets` directory. Make sure you change to that directory before giving any commands in this chapter.

== Create ConfigMap

Using ConfigMap, you can independently control MySQL configuration. The ConfigMap looks like as shown:

```
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-config
  labels:
    app: mysql
data:
  master.cnf: |
    # Apply this config only on the master.
    [mysqld]
    log-bin
  slave.cnf: |
    # Apply this config only on slaves.
    [mysqld]
    super-read-only
```

In this case, we are using master to serve replication logs to slave and slaves are read-only. Create the ConfigMap using the command shown:

  $ kubectl create -f templates/mysql-configmap.yaml
  configmap "mysql-config" created

== Create Services

Create two headless services using the following configuration:

```
# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  clusterIP: None
  selector:
    app: mysql
---
# Client service for connecting to any MySQL instance for reads.
# For writes, you must instead connect to the master: mysql-0.mysql.
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql
```

The `mysql` service is used for DNS resolution so that when pods are placed by StatefulSet controller, pods can be resolved using `<pod-name>.mysql`. `mysql-read` is a client service that does load balancing for all slaves.

  $ kubectl create -f templates/mysql-services.yaml
  service "mysql" created
  service "mysql-read" created

Only read queries can use the load-balanced `mysql-read` service. Because there is only one MySQL master, clients should connect directly to the MySQL master Pod, identified by `mysql-0.mysql`, to execute writes.

== Create StatefulSet

Finally, we create StatefulSet using the configuration in `templates/mysql-statefulset.yaml` using the command shown:

  $ kubectl create -f templates/mysql-statefulset.yaml
  statefulset "mysql" created

  $ kubectl get -w statefulset
  NAME      DESIRED   CURRENT   AGE
  mysql     3         1         8s
  mysql     3         2         59s
  mysql     3         3         2m
  mysql     3         3         3m

In a different terminal window, wou can watch the progress of pods creation using the following command:

  $ kubectl get pods -l app=mysql --watch
  NAME      READY     STATUS     RESTARTS   AGE
  mysql-0   0/2       Init:0/2   0          30s
  mysql-0   0/2       Init:1/2   0         35s
  mysql-0   0/2       PodInitializing   0         47s
  mysql-0   1/2       Running   0         48s
  mysql-0   2/2       Running   0         59s
  mysql-1   0/2       Pending   0         0s
  mysql-1   0/2       Pending   0         0s
  mysql-1   0/2       Pending   0         0s
  mysql-1   0/2       Init:0/2   0         0s
  mysql-1   0/2       Init:1/2   0         35s
  mysql-1   0/2       Init:1/2   0         45s
  mysql-1   0/2       PodInitializing   0         54s
  mysql-1   1/2       Running   0         55s
  mysql-1   2/2       Running   0         1m
  mysql-2   0/2       Pending   0         <invalid>
  mysql-2   0/2       Pending   0         <invalid>
  mysql-2   0/2       Pending   0         0s
  mysql-2   0/2       Init:0/2   0         0s
  mysql-2   0/2       Init:1/2   0         32s
  mysql-2   0/2       Init:1/2   0         43s
  mysql-2   0/2       PodInitializing   0         50s
  mysql-2   1/2       Running   0         52s
  mysql-2   2/2       Running   0         56s

Press `Ctrl`+`C` to stop watching. If you notice, the pods are initialized in an orderly fashion in their
startup process. The reason being StatefulSet controller assigns a unique, stable name (`mysql-0`,
`mysql-1`, `mysql-2`) with `mysql-0` being the master and others being slaves. The configuration uses https://www.percona.com/software/mysql-database/percona-xtrabackup[Percona
Xtrabackup] (open-source tool) to clone source MySQL server to its slaves.

== Test MySQL setup

You can use `mysql-client` to send some data to the master (`mysql-0.mysql`)

```
kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never --\
  mysql -h mysql-0.mysql <<EOF
CREATE DATABASE test;
CREATE TABLE test.messages (message VARCHAR(250));
INSERT INTO test.messages VALUES ('hello, from mysql-client');
EOF
```

You can run the following to test if slaves (`mysql-read`) received the data

```
$ kubectl run mysql-client --image=mysql:5.7 -it --rm --restart=Never --\
  mysql -h mysql-read -e "SELECT * FROM test.messages"
```

This should display an output like this:

```
+--------------------------+
| message                  |
+--------------------------+
| hello, from mysql-client |
+--------------------------+
```

To test load balancing across slaves, you can run the following command:

  kubectl run mysql-client-loop --image=mysql:5.7 -i -t --rm --restart=Never --\
     bash -ic "while sleep 1; do mysql -h mysql-read -e 'SELECT @@server_id,NOW()'; done"

Each MySQL instance is assigned a unique identifier, and it can be retrieved using `@@server_id`. This command prints the server id serving the request and the timestamp in an infinite loop.

This command will show the output:

  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         100 | 2017-10-24 03:01:11 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         100 | 2017-10-24 03:01:12 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         102 | 2017-10-24 03:01:13 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         101 | 2017-10-24 03:01:14 |
  +-------------+---------------------+

You can leave this open in a separate window while you run failure modes in the next section.

Alternatively, you can use `Ctrl`+`C` to terminate the loop.

== Testing failure modes

We will see how StatefulSet behave in different failure modes. The following modes will be tested:

. Unhealthy container
. Failed pod
. Failed node

=== Unhealthy container

MySQL container uses readiness probe by running `mysql -h 127.0.0.1 -e 'SELECT 1'` on the server to make sure MySQL server is still active.

Run this command to simulate MySQL as being unresponsive:

  kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql /usr/bin/mysql.off

This command renames the `/usr/bin/mysql` command so that readiness probe cannot find it. After a few seconds, during the next health check, the Pod should report one of its containers is not healthy. This can be verified using the command:

  kubectl get pod mysql-2
  NAME      READY     STATUS    RESTARTS   AGE
  mysql-2   1/2       Running   0          12m

`mysql-read` load balancer detects failures like this and takes action by not sending traffic to failed containers. You can check this if you have the loop running in separate window. The loop shows the following output:

```
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         101 | 2017-10-24 03:17:09 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         101 | 2017-10-24 03:17:10 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:17:11 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:17:12 |
+-------------+---------------------+
```

Revert back to its initial state

  kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql.off /usr/bin/mysql

Check the status again to see that both the pods are running and healthy:

    $ kubectl get pod -w mysql-2
    NAME      READY     STATUS    RESTARTS   AGE
    mysql-2   2/2       Running   0          5h

And the loop is now also showing all three servers.

=== Failed pod

To simulate a failed pod, you can delete a pod as shown:

  kubectl delete pod mysql-2
  pod "mysql-2" deleted

StatefulSet controller recognizes failed pods and creates a new one with same name and link to the same
PersistentVolumeClaim.

  $ kubectl get pod -w mysql-2
  NAME      READY     STATUS     RESTARTS   AGE
  mysql-2   0/2       Init:0/2   0          28s
  mysql-2   0/2       Init:1/2   0         31s
  mysql-2   0/2       PodInitializing   0         32s
  mysql-2   1/2       Running   0         33s
  mysql-2   2/2       Running   0         37s

=== Failed node

Kubernetes allows a node to be marked unschedulable using the `kubectl drain` command. This prevents any new pods to be scheduled on this node. If the API server supports eviction, then it will evict the pods. Otherwise, it will delete all the pods. The evict and delete happens for all the pods except mirror pods (which cannot be deleted through API server). Read more about drain at https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/.

You can simulate node downtime by draining the node. In order to determine which node to drain, run
this command

  $ kubectl get pod mysql-2 -o wide
  NAME      READY     STATUS    RESTARTS   AGE       IP            NODE
  mysql-2   2/2       Running   0          11m       100.96.6.12   ip-172-20-64-152.ec2.internal

Drain the node using the command:

  $ kubectl drain ip-172-20-64-152.ec2.internal --force --delete-local-data --ignore-daemonsets
  node "ip-172-20-64-152.ec2.internal" cordoned
  WARNING: Deleting pods with local storage: mysql-2; Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: kube-proxy-ip-172-20-64-152.ec2.internal
  pod "kube-dns-479524115-76s6j" evicted
  pod "mysql-2" evicted
  node "ip-172-20-64-152.ec2.internal" drained

You can look at the list of nodes:

  $ kubectl get nodes
  NAME                             STATUS                     ROLES     AGE       VERSION
  ip-172-20-107-81.ec2.internal    Ready                      node      10h       v1.7.4
  ip-172-20-122-243.ec2.internal   Ready                      master    10h       v1.7.4
  ip-172-20-125-181.ec2.internal   Ready                      node      10h       v1.7.4
  ip-172-20-37-239.ec2.internal    Ready                      master    10h       v1.7.4
  ip-172-20-52-200.ec2.internal    Ready                      node      10h       v1.7.4
  ip-172-20-57-5.ec2.internal      Ready                      node      10h       v1.7.4
  ip-172-20-64-152.ec2.internal    Ready,SchedulingDisabled   node      10h       v1.7.4
  ip-172-20-76-117.ec2.internal    Ready                      master    10h       v1.7.4

Notice how scheduling is disabled on one node.

Now you can watch Pod reschedules

  kubectl get pod mysql-2 -o wide --watch

The output always stay at:

  NAME      READY     STATUS    RESTARTS   AGE       IP        NODE
  mysql-2   0/2       Pending   0          33s       <none>    <none>

This could be a bug in StatefulSet as the pod was failing to reschedule. The reason was, there was no other nodes running in the AZ where the original node failed. The EBS volume was failing to to attach to other nodes because of different AZ restriction.

To mitigate this issue, manually scale the nodes to 6 which resulted in an additional node being available in that AZ.
Your scenario could be different and may not need this step.

Edit number of nodes to `6` if you run into `Pending` issue:

  kops edit ig nodes

Change the specification to:

  spec:
    image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
    machineType: t2.medium
    maxSize: 6
    minSize: 6
    role: Node
    subnets:
    - us-east-1a
    - us-east-1b
    - us-east-1c

Review and commit changes:

  kops update cluster --yes

It takes a few minutes for a new node to be provisioned. This can be verified using the command shown:

  $ kubectl get nodes
  NAME                             STATUS                     ROLES     AGE       VERSION
  ip-172-20-107-81.ec2.internal    Ready                      node      10h       v1.7.4
  ip-172-20-122-243.ec2.internal   Ready                      master    10h       v1.7.4
  ip-172-20-125-181.ec2.internal   Ready                      node      10h       v1.7.4
  ip-172-20-37-239.ec2.internal    Ready                      master    10h       v1.7.4
  ip-172-20-52-200.ec2.internal    Ready                      node      10h       v1.7.4
  ip-172-20-57-5.ec2.internal      Ready                      node      10h       v1.7.4
  ip-172-20-64-152.ec2.internal    Ready,SchedulingDisabled   node      10h       v1.7.4
  ip-172-20-73-181.ec2.internal    Ready                      node      1m        v1.7.4
  ip-172-20-76-117.ec2.internal    Ready                      master    10h       v1.7.4

Now you can watch the status of the pod:

  $ kubectl get pod mysql-2 -o wide
  NAME      READY     STATUS    RESTARTS   AGE       IP           NODE
  mysql-2   2/2       Running   0          11m       100.96.8.2   ip-172-20-73-181.ec2.internal

Let's put the previous node back into normal state:

  $ kubectl uncordon ip-172-20-64-152.ec2.internal
  node "ip-10-10-71-96.ec2.internal" uncordoned

The list of nodes is now shown as:

  $ kubectl get nodes
  NAME                             STATUS    ROLES     AGE       VERSION
  ip-172-20-107-81.ec2.internal    Ready     node      10h       v1.7.4
  ip-172-20-122-243.ec2.internal   Ready     master    10h       v1.7.4
  ip-172-20-125-181.ec2.internal   Ready     node      10h       v1.7.4
  ip-172-20-37-239.ec2.internal    Ready     master    10h       v1.7.4
  ip-172-20-52-200.ec2.internal    Ready     node      10h       v1.7.4
  ip-172-20-57-5.ec2.internal      Ready     node      10h       v1.7.4
  ip-172-20-64-152.ec2.internal    Ready     node      10h       v1.7.4
  ip-172-20-73-181.ec2.internal    Ready     node      3m        v1.7.4
  ip-172-20-76-117.ec2.internal    Ready     master    10h       v1.7.4

== Scaling slaves

More slaves can be added to the MySQL cluster to increase the read query capacity. This can be done using the command shown:

  $ kubectl scale statefulset mysql --replicas=5
  statefulset "mysql" scaled

Of course, you can watch the progress of scaling

  kubectl get pods -l app=mysql -w

It shows the output:

  $ kubectl get pods -l app=mysql -w
  NAME      READY     STATUS     RESTARTS   AGE
  mysql-0   2/2       Running    0          6h
  mysql-1   2/2       Running    0          6h
  mysql-2   2/2       Running    0          16m
  mysql-3   0/2       Init:0/2   0          1s
  mysql-3   0/2       Init:1/2   0         18s
  mysql-3   0/2       Init:1/2   0         28s
  mysql-3   0/2       PodInitializing   0         36s
  mysql-3   1/2       Running   0         37s
  mysql-3   2/2       Running   0         43s
  mysql-4   0/2       Pending   0         <invalid>
  mysql-4   0/2       Pending   0         <invalid>
  mysql-4   0/2       Pending   0         0s
  mysql-4   0/2       Init:0/2   0         0s
  mysql-4   0/2       Init:1/2   0         31s
  mysql-4   0/2       Init:1/2   0         41s
  mysql-4   0/2       PodInitializing   0         52s
  mysql-4   1/2       Running   0         53s
  mysql-4   2/2       Running   0         58s

If the loop is still running, then it will print an output as shown:

  +-------------+---------------------+
  |         101 | 2017-10-24 03:53:53 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         100 | 2017-10-24 03:53:54 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         102 | 2017-10-24 03:53:55 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         103 | 2017-10-24 03:53:57 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         103 | 2017-10-24 03:53:58 |
  +-------------+---------------------+
  +-------------+---------------------+
  | @@server_id | NOW()               |
  +-------------+---------------------+
  |         104 | 2017-10-24 03:53:59 |
  +-------------+---------------------+

You can also verify if the slaves have the same data set:

  kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --\
   mysql -h mysql-3.mysql -e "SELECT * FROM test.messages"

It still shows the same result:

  +--------------------------+
  | message                  |
  +--------------------------+
  | hello, from mysql-client |
  +--------------------------+

You can scale down by using the command shown:

  kubectl scale statefulset mysql --replicas=3
  statefulset "mysql" scaled

Note that, scale in doesn't delete the data or PVCs attached to the pods. You have to delete
them manually

  kubectl delete pvc data-mysql-3
  kubectl delete pvc data-mysql-4

It shows the output:

  persistentvolumeclaim "data-mysql-3" deleted
  persistentvolumeclaim "data-mysql-4" deleted

== Cleaning up

First delete the StatefulSet. This also terminates the pods:

  $ kubectl delete statefulset mysql
  statefulset "mysql" deleted

Verify there are no more pods running:

  kubectl get pods -l app=mysql

It shows the output:

  No resources found.

Delete ConfigMap, Service, PVC using the command:

  $ kubectl delete configmap,service,pvc -l app=mysql
  configmap "mysql-config" deleted
  service "mysql" deleted
  service "mysql-read" deleted
  persistentvolumeclaim "data-mysql-0" deleted
  persistentvolumeclaim "data-mysql-1" deleted
  persistentvolumeclaim "data-mysql-2" deleted

You are now ready to continue on with the workshop!

:frame: none
:grid: none
:valign: top

[align="center", cols="1", grid="none", frame="none"]
|=====
|image:button-continue-developer.png[link=../../03-path-application-development/308-cicd-workflows/]
|link:../../developer-path.adoc[Go to Developer Index]
|=====