= Kubernetes Cluster Monitoring
:toc:
:icons:
:linkcss:
:imagesdir: ../../resources/images

== Introduction

This chapter will demonstrate how to monitor a Kubernetes cluster using the following:

. Kubernetes Dashboard
. Heapster, InfluxDB and Grafana
. Prometheus, Node exporter and Grafana

http://prometheus.io/[Prometheus] is an open-source systems monitoring and alerting toolkit. Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets.

Heapster is limited to Kuberenetes container metrics, it is not general use. Heapster can be used as Prometheus scrape target.

== Prerequisites

In order to perform exercises in this chapter, you’ll need to deploy configurations to a Kubernetes cluster. To create an EKS-based Kubernetes cluster, use the link:../../01-path-basics/102-your-first-cluster#create-a-kubernetes-cluster-with-eks[AWS CLI] (recommended). If you wish to create a Kubernetes cluster without EKS, you can instead use link:../../01-path-basics/102-your-first-cluster#alternative-create-a-kubernetes-cluster-with-kops[kops].

All configuration files for this chapter are in the link:templates[201-cluster-monitoring/templates] directory.

== Kubernetes Dashboard

https://github.com/kubernetes/dashboard[Kubernetes Dashboard] is a general purpose web-based UI for Kubernetes clusters.

The Dashboard uses the https://kubernetes.io/docs/admin/authorization/rbac/[RBAC API], which has been promoted in
Kubernetes v1.8 to GA rather than Beta, so you'll use a different version of
the dashboard depending on the version of Kubernetes you are running. Check your Kubernetes version using the following command -
check the value of the `Server Version`, which is v1.7.4 in this example:

    kubectl version

    $ kubectl version
    Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.4", GitCommit:"793658f2d7ca7f064d2bdf606519f9fe1229c381", GitTreeState:"clean", BuildDate:"2017-08-17T08:30:51Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

If you are using v1.7.x, deploy the Dashboard using the following command:

    kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/6dc75162dce25b5a94aa500ebba923e8223e5cfd/src/deploy/recommended/kubernetes-dashboard.yaml

If you are using v1.8 or above, deploy the Dashboard using the following command:

    kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

Dashboard can be seen using the following command:

    kubectl proxy --address 0.0.0.0 --accept-hosts '.*' --port 8080

Now, Dashboard is accessible via `Preview`, `Preview Running Application` as:

    https://ENVIRONMENT_ID.vfs.cloud9.REGION_ID.amazonaws.com/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

Where `ENVIRONMENT_ID` is your Cloud9 IDE environment id (you should see it once click the built-in browser address bar) and `REGION_ID` is AWS region id (e.g. us-east-1).

Starting with Kubernetes 1.7, Dashboard supports authentication. Read more about it at https://github.com/kubernetes/dashboard/wiki/Access-control#introduction. We'll use a bearer token for authentication.

Check existing secrets in the `kube-system` namespace:

    kubectl -n kube-system get secret

It shows the output as:

  NAME                                     TYPE                                  DATA      AGE
  attachdetach-controller-token-dhkcr      kubernetes.io/service-account-token   3         3h
  certificate-controller-token-p131b       kubernetes.io/service-account-token   3         3h
  daemon-set-controller-token-r4mmp        kubernetes.io/service-account-token   3         3h
  default-token-7vh0x                      kubernetes.io/service-account-token   3         3h
  deployment-controller-token-jlzkj        kubernetes.io/service-account-token   3         3h
  disruption-controller-token-qrx2v        kubernetes.io/service-account-token   3         3h
  dns-controller-token-v49b6               kubernetes.io/service-account-token   3         3h
  endpoint-controller-token-hgkbm          kubernetes.io/service-account-token   3         3h
  generic-garbage-collector-token-34fvc    kubernetes.io/service-account-token   3         3h
  horizontal-pod-autoscaler-token-lhbkf    kubernetes.io/service-account-token   3         3h
  job-controller-token-c2s8j               kubernetes.io/service-account-token   3         3h
  kube-dns-autoscaler-token-s3svx          kubernetes.io/service-account-token   3         3h
  kube-dns-token-92xzb                     kubernetes.io/service-account-token   3         3h
  kube-proxy-token-0ww14                   kubernetes.io/service-account-token   3         3h
  kubernetes-dashboard-certs               Opaque                                2         9m
  kubernetes-dashboard-key-holder          Opaque                                2         9m
  kubernetes-dashboard-token-vt0fd         kubernetes.io/service-account-token   3         10m
  namespace-controller-token-423gh         kubernetes.io/service-account-token   3         3h
  node-controller-token-r6lsr              kubernetes.io/service-account-token   3         3h
  persistent-volume-binder-token-xv30g     kubernetes.io/service-account-token   3         3h
  pod-garbage-collector-token-fwmv4        kubernetes.io/service-account-token   3         3h
  replicaset-controller-token-0cg8r        kubernetes.io/service-account-token   3         3h
  replication-controller-token-3fwxd       kubernetes.io/service-account-token   3         3h
  resourcequota-controller-token-6rl9f     kubernetes.io/service-account-token   3         3h
  route-controller-token-9brzb             kubernetes.io/service-account-token   3         3h
  service-account-controller-token-bqlsk   kubernetes.io/service-account-token   3         3h
  service-controller-token-1qlg6           kubernetes.io/service-account-token   3         3h
  statefulset-controller-token-kmgzg       kubernetes.io/service-account-token   3         3h
  ttl-controller-token-vbnhf               kubernetes.io/service-account-token   3         3h

We can login using the secret with type 'kubernetes.io/namespace-controller-token'. In our case, we'll use the token from secret `namespace-controller-token-423gh` to login. Use the following command to get the token for this secret:

    kubectl -n kube-system describe secret namespace-controller-token-423gh

Note you'll need to replace `namespace-controller-token-423gh` with the namespace-controller-token from your output list.

It shows the output:

```
Name:         namespace-controller-token-423gh
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name=default
              kubernetes.io/service-account.uid=3a3fea86-b3a1-11e7-9d90-06b1e747c654

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1046 bytes
namespace:  11 bytes
token:      eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkZWZhdWx0LXRva2VuLTd2aDB4Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImRlZmF1bHQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIzYTNmZWE4Ni1iM2ExLTExZTctOWQ5MC0wNmIxZTc0N2M2NTQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06ZGVmYXVsdCJ9.GHW-7rJcxmvujkClrN6heOi_RYlRivzwb4ScZZgGyaCR9tu2V0Z8PE5UR6E_3Vi9iBCjuO6L6MLP641bKoHB635T0BZymJpSeMPQ7t1F02BsnXAbyDFfal9NUSV7HoPAhlgURZWQrnWojNlVIFLqhAPO-5T493SYT56OwNPBhApWwSBBGdeF8EvAHGtDFBW1EMRWRt25dSffeyaBBes5PoJ4SPq4BprSCLXPdt-StPIB-FyMx1M-zarfqkKf7EJKetL478uWRGyGNNhSfRC-1p6qrRpbgCdf3geCLzDtbDT2SBmLv1KRjwMbW3EF4jlmkM4ZWyacKIUljEnG0oltjA
```

Copy the value of token from this output, select `Token` in the Dashboard login window, and paste the text. Click on `SIGN IN` to see the default Dashboard view:

image::kubernetes-dashboard-default.png[]

Click on `Nodes` to see a textual representation about the nodes running in the cluster:

image::monitoring-nodes-before.png[]

Install a Java application as explained in link:../../03-path-application-development/306-app-management-with-helm[Deploying applications using Kubernetes Helm charts].

Click on `Pods`, again to see a textual representation about the pods running in the cluster:

image::monitoring-pods-before.png[]

This will change after Heapster, InfluxDB and Grafana are installed.

== Heapster, InfluxDB and Grafana

https://github.com/kubernetes/heapster[Heapster] is a metrics aggregator and processor. It is installed as a cluster-wide pod. It gathers monitoring and events data for all containers on each node by talking to the Kubelet. Kubelet itself fetches this data from https://github.com/google/cadvisor[cAdvisor]. This data is persisted in a time series database https://github.com/influxdata/influxdb[InfluxDB] for storage. The data is then visualized using a http://grafana.org/[Grafana] dashboard, or it can be viewed in Kubernetes Dashboard.

Heapster collects and interprets various signals like compute resource usage, lifecycle events, etc., and exports cluster metrics via REST endpoints.

Heapster, InfluxDB and Grafana are http://kubernetes.io/docs/admin/addons/[Kubernetes addons].

=== Installation

Execute this command to install Heapster, InfluxDB and Grafana:

  $ kubectl apply -f templates/heapster/
  deployment "monitoring-grafana" created
  service "monitoring-grafana" created
  clusterrolebinding "heapster" created
  serviceaccount "heapster" created
  deployment "heapster" created
  service "heapster" created
  deployment "monitoring-influxdb" created
  service "monitoring-influxdb" created

Heapster is now aggregating metrics from the cAdvisor instances running on each node. This data is stored in an InfluxDB instance running in the cluster. Grafana dashboard, accessible at https://ENVIRONMENT_ID.vfs.cloud9.REGION_ID.amazonaws.com/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/?orgId=1, now shows the information about the cluster.

NOTE: Grafana dashboard will not be available if Kubernetes proxy is not running. If proxy is not running, it can be started with the command `kubectl proxy --address 0.0.0.0 --accept-hosts '.*' --port 8080`.

=== Grafana dashboard

There are some built-in dashboards for monitoring the cluster and workloads. They are available by clicking on the upper left corner of the screen.

image::monitoring-grafana-dashboards.png[]

The "`Cluster`" dashboard shows all worker nodes, and their CPU and memory metrics. Type in a node name to see its collected metrics during a chosen period of time.

The cluster dashboard looks like this:

image::monitoring-grafana-dashboards-cluster.png[]

The "`Pods`"" dashboard allows you to see the resource utilization of every pod in the cluster. As with nodes, you can select the pod by typing its name in the top filter box.

image::monitoring-grafana-dashboards-pods.png[]

After the deployment of Heapster, Kubernetes Dashboard now shows additional graphs such as CPU and Memory utilization for pods and nodes, and other workloads.

The updated view of the cluster in Kubernetes Dashboard looks like this:

image::monitoring-nodes-after.png[]

The updated view of pods looks like this:

image::monitoring-pods-after.png[]

=== Cleanup

Remove all the installed components:

    kubectl delete -f templates/heapster/

== Prometheus, Node exporter and Grafana

http://prometheus.io/[Prometheus] is an open-source systems monitoring and alerting toolkit. Prometheus collects metrics from monitored targets by scraping metrics from HTTP endpoints on these targets.

Prometheus will be managed by the https://github.com/coreos/prometheus-operator/[Kubernetes Operator] - This operator uses https://kubernetes.io/docs/concepts/api-extension/custom-resources/[Custom Resources] to extend the Kubernetes API and add custom resources such as `Prometheus`, `ServiceMonitor` and `Alertmanager`.

Prometheus is able to dynamically scrape new targets by adding a https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/running-exporters.md[ServiceMonitor] - we have included a couple of them to scrape `kube-controller-manager`, `kube-scheduler`, `kube-state-metrics`, `kubelet` and `node-exporter`.

https://github.com/prometheus/node_exporter[Node exporter] is a Prometheus exporter for hardware and OS metrics exposed by *NIX kernels.
https://github.com/kubernetes/kube-state-metrics[kube-state-metrics] is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.

=== Installation

First we need to deploy the Prometheus Operator which will listen for the new Custom Resources:

  $ kubectl apply -f templates/prometheus/prometheus-bundle.yaml
  namespace "monitoring" created
  clusterrolebinding "prometheus-operator" created
  clusterrole "prometheus-operator" created
  serviceaccount "prometheus-operator" created
  deployment "prometheus-operator" created

Next we need to wait until the Prometheus Operator has started:


  $ kubectl rollout status deployment/prometheus-operator -n monitoring
  ...
  deployment "prometheus-operator" successfully rolled out

As a final step we need to deploy the Prometheus Custom Resource, Service Monitors, Cluster Roles and Bindings (RBAC):

  $ kubectl apply -f templates/prometheus/prometheus.yaml
  serviceaccount "kube-state-metrics" created
  clusterrole "kube-state-metrics" created
  clusterrolebinding "kube-state-metrics" created
  service "kube-scheduler-prometheus-discovery" created
  service "kube-controller-manager-prometheus-discovery" created
  daemonset "node-exporter" created
  service "node-exporter" created
  deployment "kube-state-metrics" created
  service "kube-state-metrics" created
  prometheus "prometheus" created
  servicemonitor "prometheus-operator" created
  servicemonitor "kube-apiserver" created
  servicemonitor "kubelet" created
  servicemonitor "kube-controller-manager" created
  servicemonitor "kube-scheduler" created
  servicemonitor "kube-state-metrics" created
  servicemonitor "node-exporter" created
  alertmanager "main" created
  secret "alertmanager-main" created

Lets wait for prometheus to come up:

  $ kubectl get po -l prometheus=prometheus -n monitoring
  NAME                      READY     STATUS    RESTARTS   AGE
  prometheus-prometheus-0   2/2       Running   0          1m
  prometheus-prometheus-1   2/2       Running   0          1m

=== Prometheus Dashboard

Prometheus is now scraping metrics from the different scraping targets and we forward the dashboard via:

  $ kubectl port-forward $(kubectl get po -l prometheus=prometheus -n monitoring -o jsonpath={.items[0].metadata.name}) 9090 -n monitoring
  Forwarding from 127.0.0.1:9090 -> 9090

Now open the browser at http://localhost:9090/targets and all targets should be shown as `UP` (it might take a couple of minutes until data collectors are up and running for the first time). The browser displays the output as shown:

image::monitoring-grafana-prometheus-dashboard-1.png[]
image::monitoring-grafana-prometheus-dashboard-2.png[]
image::monitoring-grafana-prometheus-dashboard-3.png[]

=== Grafana Installation

To install grafana we need to run:

  $ kubectl apply -f templates/prometheus/grafana-bundle.yaml
  secret "grafana-credentials" created
  service "grafana" created
  configmap "grafana-dashboards-0" created
  deployment "grafana" created

Lets wait for grafana to come up:

  $ kubectl rollout status deployment/grafana -n monitoring
  ...
  deployment "grafana" successfully rolled out

=== Grafana Dashboard

Lets forward the grafana dashboard to a local port:

  $ kubectl port-forward $(kubectl get pod -l app=grafana -o jsonpath={.items[0].metadata.name} -n monitoring) 3000 -n monitoring
  Forwarding from 127.0.0.1:3000 -> 3000

Grafana dashboard is now accessible at http://localhost:3000/. The complete list of dashboards is available using the search button at the top:

image::monitoring-grafana-prometheus-dashboard-dashboard-home.png[]

You can access various metrics using these dashboards:

. http://localhost:3000/dashboard/db/kubernetes-control-plane-status?orgId=1[Kubernetes Cluster Control Plane]
+
image::monitoring-grafana-prometheus-dashboard-control-plane-status.png[]
+
. http://localhost:3000/dashboard/db/kubernetes-cluster-status?orgId=1[Kubernetes Cluster Status]
+
image::monitoring-grafana-prometheus-dashboard-cluster-status.png[]
+
. http://localhost:3000/dashboard/db/kubernetes-capacity-planning?orgId=1[Kubernetes Cluster Capacity Planning]
+
image::monitoring-grafana-prometheus-dashboard-capacity-planning.png[]
+
. http://localhost:3000/dashboard/db/nodes?orgId=1[Nodes in the Kubernetes cluster]
+
image::monitoring-grafana-prometheus-dashboard-nodes.png[]

Convenient link for other dashboards are listed below:

* http://localhost:3000/dashboard/db/deployment&orgId=1
* http://localhost:3000/dashboard/db/kubernetes-cluster-health?refresh=10s&orgId=1
* http://localhost:3000/dashboard/db/kubernetes-resource-requests?orgId=1
* http://localhost:3000/dashboard/db/pods?orgId=1

=== Cleanup

Remove all the installed components:

    kubectl delete -f templates/prometheus/prometheus-bundle.yaml


You are now ready to continue on with the workshop!

:frame: none
:grid: none
:valign: top

[align="center", cols="2", grid="none", frame="none"]
|=====
|image:button-continue-standard.png[link=../../02-path-working-with-clusters/202-service-mesh]
|image:button-continue-operations.png[link=../../02-path-working-with-clusters/202-service-mesh]
|link:../../standard-path.adoc[Go to Standard Index]
|link:../../operations-path.adoc[Go to Operations Index]
|=====