# Simple Web Application to demonstrate deployment of multi-arch images of Graviton3(c7g) and Intel/AMD(c6) instacnes

## Introduction
Graviton instances can reduce compute costs up to 30% with high concurrent workloads. You can also increase your application's resilience by allowing it to run on both Intel/AMD and Graviton EC2 instances. We show you how to transform your CPU-bounded Python application into a processor-agnostic application that performs 30% faster on c7g EC2 instances than c6 EC2 instances.

Firstly, we use Karpneter to launche the right EC2 instance based on application fit and availability in the AWS region after building a Docker image that supports Intel and Graviton instances. Next, we increase CPU usage by loading the application with synthetic requests. Finally, we compare application throughput on Graviton versus Intel/AMD instances throughout the load cycle.

## Methodology 
* Populate the following enviroment variables

```shell
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account)
export AWS_REGION=us-east-1
export AWS_DEFAULT_REGION=us-east-1
export BUILDX_VER=v0.10.3
export APP_IMAGE_NAME=simplemultiarchimage
export APP_IMAGE_TAG=multiarch-py3
export APP_IMAGE_ARM_TAG=multiarch-py3-arm
export APP_IMAGE_AMD_TAG=multiarch-py3-amd
export GITHUB_USER=yahavb
export GITHUB_BRANCH=main
export GITHUB_REPO=app-health-with-aws-load-balancer-controller
export CLUSTER_NAME=grv-usw2
export GITHUB_USER=yahavb
export GITHUB_BRANCH=main
export GITHUB_REPO=app-health-with-aws-load-balancer-controller
export GITHUB_OAUTH_TOKEN=create classic token - Settings->Developer settings -> Tokens (classic)
```

* Enable multi-arch builds (linux/arm64 and linux/amd64)
```bash
docker buildx create --name craftbuilder
```

* Create and deploy the ECR docker registry and images for the app

```bash
./create-ecr-sqs.sh
./buildx.sh
```

We used `docker buildx` for simplicity but there are cases where the buildx emulator throws exceptions or too slow. In such cases we reccomend using [native `docker build` follow by `docker manifest`](https://github.com/aws-samples/containerized-game-servers/blob/master/supertuxkart/multi-arch-ci.md)

* Deploy Karpenter
Follow https://karpenter.sh for cluster and karpneter install

* Deploy container insights
Follow [container insights deploy steps](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-quickstart.html)
```bash
ClusterName=${CLUSTER_NAME}
RegionName=${AWS_REGION}
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f -
```

* Deploy AWS LoadBalancer Controller
Follow [aws-loadbalancer-controllers](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html)
```bash
eksctl create iamserviceaccount \
  --cluster=${CLUSTER_NAME} \
  --namespace=kube-system \
  --name=aws-load-balancer-controller \
  --role-name "AmazonEKSLoadBalancerControllerRole" \
  --attach-policy-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:policy/AWSLoadBalancerControllerIAMPolicy \
  --approve

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=${CLUSTER_NAME} \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller
```

* Deploy two karpenter provisioners

```bash
cat app-arm-provisioner.yaml | envsubst | kubectl apply -f -
cat app-amd-provisioner.yaml | envsubst | kubectl apply -f -
```

Note we used `instance-generation`, `instance-cpu`, and `arch` labels to Karpenter to launch `c7g.large` and `c6.large` or `c6a.large` instances. 

```shell
  - key: karpenter.k8s.aws/instance-generation
    operator: In
    values:
    - "7"
  - key: karpenter.k8s.aws/instance-cpu
    operator: In
    values:
    - "2"
  - key: kubernetes.io/arch
    operator: In
    values:
    - arm64
```

```shell
  - key: karpenter.k8s.aws/instance-generation
    operator: In
    values:
    - "6"
  - key: karpenter.k8s.aws/instance-cpu
    operator: In
    values:
    - "2"
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
```
* Create K8s service and ingress for the sample webapp with three paths: `/arm`, `/amd` and `/app`. `/app` will redirect the traffic between `/arm/` and `/amd`

```bash
cat app-ingress-weighted-routing.yaml | envsubst | kubectl apply -f -
```

* Deploy the sample app

```shell
cat app-arm-deploy.yaml | envsubst | kubectl apply -f -
cat app-amd-deploy.yaml | envsubst | kubectl apply -f -
```

We simulated CPU usage by the application with numpy matrix multiplication of two random generated pandas DataFrames using `numpy.matmul`. We used `_matrix_dim,` to control the CPU usage. Larger value casues more CPU usage. The baseline is `MATRIX_DIM=500` will cap the c7g.large and c6a.large to 80% CPU usage.

```python
import numpy as np
import pandas as pd

_matrix_dim=int(os.environ['MATRIX_DIM'])

df1 = pd.DataFrame(data=np.random.randint(_matrix_dim,size=(_matrix_dim,_matrix_dim)));
df2 = pd.DataFrame(data=np.random.randint(_matrix_dim,size=(_matrix_dim,_matrix_dim)));
df12 = np.matmul(df1,df2)
```

* Discover the ingress ALB endpoint

```shell
kubectl get ingress
```

Note we used two k8s services to route traffic to the Graviton and Intel powered pods. Note the `service.selector.app` in both services and the one specified in the `deployment.template.metadata.labels.app`

```yaml
apiVersion: v1
kind: Service
metadata:
  name: armsimplemultiarchapp-svc
spec:
  selector:
    app: armsimplemultiarchapp
  ports:
    - port: 80
      targetPort: 8000
  type: NodePort
---
apiVersion: v1
kind: Service
metadata:
  name: amdsimplemultiarchapp-svc
spec:
  selector:
    app: amdsimplemultiarchapp
  ports:
    - port: 80
      targetPort: 8000
  type: NodePort
---
```

We use Application Load Balancer  with the AWS Load Balancer Controller to spread the load among the Graviton and x86 compute pools. We control the traffic routing with the alb.ingress.kuberenetes.io/actions.weighted-routing annotation

```yaml
alb.ingress.kubernetes.io/actions.weighted-routing: | 
{
  "type":"forward",
  "forwardConfig":{
  "targetGroups":[
    {
      "serviceName":"armsimplemultiarchapp-svc",
      "servicePort":"80","weight":50
    },
    {
      "serviceName":"amdsimplemultiarchapp-svc",
      "servicePort":"80","weight":50}]
    }
 }'

spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: weighted-routing
                port:
                  name: use-annotation

```

Copy the ADDRESS value and browse to http://$ADDRESS/app/runtime/ and notice the `instance-type` alternating between pods that runs on `arm64` and `amd64` cpus.

* Single-Node tests - run a load simulator to test the performance within single Graviton node and single Intel/AMD node. 
Configure the `ARM_APP_URL` and `AMD_APP_URL` with the ALB address (previous step) and deploy the app loader

```shell
kubectl apply -f app-loader.yaml
```

* Let it run for an hour and observe the CW target group metric `HTTPCode_Target_2XX_Count` under `AWS/ApplicationELB` to assess the application throughput among the two target groups. 

* Multi-Node test - add [HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) that will scale the number of pods to see how Graviton throughput scales across many nodes
```
kubectl autoscale deploy armsimplemultiarchapp --cpu-percent=90 --min=1 --max=100
kubectl autoscale deploy amdsimplemultiarchapp --cpu-percent=90 --min=1 --max=100
```

## Results
### Single-node load test
![Single-node load test - CPU usage and App througput - c7g.large and c6a.large](./single-node-load-baseline.png)

### Multi-node load test
![Multi-node load test - App througput - c7g.large and c6a.large](./multi-node-load-throughput.png)
![Multi-node load test - Num of pods/nodes - c7g.large and c6a.large](./multi-node-load-nodes.png)

## Analysis
The single-node test is designed to test the application's throughput under minor and significant loads. There is small difference in throughput under minor load (<70%) but app throughput that runs on Graviton is between 30%-50% higher than app throughput that runs on Intel under heavy load >70%. That's attributed to the minimal overhead of context-switch in Graviton compared to Intel's simultaneous multithreading. 

We consider the load valid until the application fails and returns HTTP 5XX for more than 1% of the load. We noticed that the X86-based app fails when the CPU Utilization crosses 80% and the Graviton crosses 90%. 

In the multi-node test, similar app throughput is tested across 50 nodes, which translates into cost. When the HPA threshold is crossed during >90% load, we see 30% better Graviton utilization, 56 Intel nodes compared to 43 Graviton nodes to achieve the same throughput. 

## Conclusion

Graviton3 performance benefits are shown during heavy load compared to Intel processors because of the 1 vCPU to 1 physical processor core mapping and large cache (L1, L2, and L3). 

Also, docker images that run on both Graviton and Intel processors helps to procure capacity from multiple EC2 Instance pools (Spot and on-demand)