# Cluster Traffic Capture

This package contains code and configuration to facilitate the capture of traffic to an Elasticsearch/OpenSearch cluster while having as minimal an impact on throughput/latency as possible.  The goal is to provide a way to validate the performance of a real workload against a new, prospective cluster that may be configured differently.  The current solution uses HAProxy [1] to synchronously capture the stream of requests/responses to the user's existing ("Primary") cluster while passing traffic through.  It also replicates the traffic to the prospective ("Shadow") cluster using a mirroring Stream Proccessing Offload Agent [2].  You can learn more about this scenario from this blog post [3].  See below for a network diagram and further explanation of the intended usage.

At a high level, the contents are:
* `./build_docker_images.py`: A command-line script that builds Docker images for the Primary and Shadow HAProxy instances
* `./demo_haproxy.py`: A script that executes the `./build_docker_images.py` to generate HAProxy Docker images and stands up Primary/Shadow ES 7.10.2 clusters in Docker containers on the local host with traffic mirroring enabled.
* `./docker_config_traffic_gen/Dockerfile`: A Dockerfile which, when build/run while the `./demo_haproxy.py` setup is running, will drive test traffic to the Primary test cluster using the OpenSearch Benchmarking tool [4].

[1] https://github.com/haproxy/haproxy
[2] https://github.com/haproxytech/spoa-mirror
[3] https://www.haproxy.com/blog/haproxy-traffic-mirroring-for-real-world-testing/
[4] https://github.com/opensearch-project/opensearch-benchmark

## Network Setup

Currently, it is expected that a user wishing to capture traffic for validation would spin up HAProxy instances between the Client and Primary cluster, and between the Primary HAProxy Instance and the Shadow cluster, like so:

                CLIENT
                 /|\
                  | normal traffic
                  |
                  |
                 \|/        mirrors traffic
           PRIMARY HAPROXY ------------------> SHADOW HAPROXY
                 /|\            one-way              /|\
                  |                                   |
                  |                                   |
                  |                                   |
                 \|/                                 \|/
           PRIMARY CLUSTER                     SHADOW CLUSTER

A description of each actor is as follows:
* CLIENT: The user's existing source of traffic to their PRIMARY CLUSTER
* PRIMARY CLUSTER: The user's existing cluster, which they are considering migrating to a new location or configuration, and want to capture the traffic to for validation
* SHADOW CLUSTER: The prospective cluster "under evaluation", which will have the traffic to the PRIMARY CLUSTER replayed against it
* PRIMARY HAPROXY: One or more hosts running the Primary HAProxy Docker image.  Each synchronously passes traffic from the CLIENT to PRIMARY CLUSTER while recording the requests/responses.  Additionally, it mirrors the traffic one-way to the SHADOW HAPROXY host(s).  
* SHADOW HAPROXY: One or more hosts running the Shadow HAProxy Docker image.  Each synchronously passes traffic it receives from the PRIMARY HAPROXY to the SHADOW CLUSER while recording requests/responses.  Traffic received from the SHADOW CLUSTER is not passed back to the PRIMARY HAPROXY or the CLIENT.

## How To Use This Package

### Pre-Requisites

* This github repo cloned locally
* Python3 and venv
* Docker installed on your host
* Currently in the same directory as this README, etc

### Building The Docker Images

#### Step 1 - Activate your Python virtual environment

To isolate the Python environment for the project from your local machine, create virtual environment like so:
```
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

You can exit the Python virtual environment and remove its resources like so:
```
deactivate
rm -rf .venv
```

Learn more about venv [here](https://docs.python.org/3/library/venv.html).

### Step 2 - Build the images

You should now be able to invoke the build script, like so:
```
./build_docker_images.py --primary-image haproxy-primary --primary-nodes host.docker.internal:9200 host.docker.internal:9201 --shadow-haproxy host.docker.internal:81 --shadow-image haproxy-shadow --shadow-nodes host.docker.internal:9202 host.docker.internal:9203 --internal-port 9200
```

This will make two Docker images (`haproxy-primary`, `haproxy-shadow`) that are configured to direct/capture traffic to the primary and shadow cluster nodes you specified.

### Running The Demo

The demo is supposed to provide an example of the setup's intended usage that is easy to test/interrogate on a laptop.  It stands up 6 Docker images: 1 HAProxy Primary, 2 ES 7.10.2 Primary Nodes, 1 HAProxy Shadow, and 2 ES 7.10.2 Shadow Nodes.  The user can send traffic to the Primary HAProxy and see it mirrored to the Shadow Cluster while being logged.

#### Step 1 - Activate your Python virtual environment

To isolate the Python environment for the project from your local machine, create virtual environment like so:
```
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

You can exit the Python virtual environment and remove its resources like so:
```
deactivate
rm -rf .venv
```

Learn more about venv [here](https://docs.python.org/3/library/venv.html).

#### Step 2 - Set up AWS credentials

This demo offloads captured from the HAProxy Primary container to AWS CloudWatch using the CloudWatch Agent.  CloudWatch Agent is not the only implementation of offload, but just one approach.

In order to use the Agent, we need to get access to AWS Credentials.  This can be done a numer of ways, including [setting up a dedicated IAM User (not recommended)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html) or [using the AWS CLI STS Commands to assume a Role](https://aws.amazon.com/premiumsupport/knowledge-center/iam-assume-role-cli/).  Whichever way you choose to create AWS Credentials, they should have read/write permissions to CloudWatch Logs.

Export the credentials in your terminal session like so:
```
export AWS_ACCESS_KEY_ID=<access key ID> \
    && export AWS_SECRET_ACCESS_KEY=<secret access key>
```

If you assumed a Role, you'll have a session token as well which you need to export:
```
export AWS_SESSION_TOKEN=<session token>
```

Finally, export the AWS Region ID you want the captured logs to be exported to:
```
export AWS_REGION=us-east-2
```

The demo script uses these ENV variables to safely construct an AWS credential file inside the HAProxy Primary container.  This file's lifespan is tied to the container's lifespan and should be terminated when the container is.

#### Step 3 - Start the demo

You should now be able to invoke the demo script, like so:

```
(.venv) chelma@3c22fba4e266 cluster_traffic_capture % ./demo_haproxy.py
Creating primary cluster...
Waiting up to 30 sec for cluster to be active...
Cluster primary-cluster is active
Creating shadow cluster...
Waiting up to 30 sec for cluster to be active...
Cluster shadow-cluster is active
Building HAProxy Docker images for Primary and Shadow HAProxy containers...
Executing command to build Docker images: ./build_docker_images.py --primary-image haproxy-primary --primary-nodes host.docker.internal:9200 host.docker.internal:9201 --shadow-haproxy host.docker.internal:81 --shadow-image haproxy-shadow --shadow-nodes host.docker.internal:9202 host.docker.internal:9203 --internal-port 9200
Subshell> Copying Docker-related files to: /tmp/cluster_traffic_capture
Subshell> Writing HAProxy Config to: /tmp/cluster_traffic_capture/haproxy_no_mirror.cfg
Subshell> Writing HAProxy Config to: /tmp/cluster_traffic_capture/haproxy_w_mirror.cfg
Subshell> Building HAProxy Docker image for Primary Cluster...
Subshell> Primary HAProxy image available locally w/ tag: haproxy-primary
Subshell> Building HAProxy Docker image for Shadow Cluster...
Subshell> Shadow HAProxy image available locally w/ tag: haproxy-shadow
Starting HAProxy container for Shadow Cluster...
Starting HAProxy container for Primary Cluster...

HAProxy is currently running in a Docker container, available at 127.0.0.1:80, and configured to pass traffic to an
ES 7.10.2 cluster of two nodes (each running in their own Docker containers).  The requests/responses passed to the
cluster via the HAProxy container will be logged to the container at /var/log/haproxy-traffic.log.  The requests are
mirrored to an identical shadow cluster.

Some example commands you can run to demonstrate the behavior are:
curl -X GET 'localhost:80'
curl -X GET 'localhost:80/_cat/nodes?v=true&pretty'
curl -X PUT 'localhost:80/noldor/_doc/1' -H 'Content-Type: application/json' -d'{"name": "Finwe"}'
curl -X GET 'localhost:80/noldor/_doc/1'

When you are done playing with the setup, hit the RETURN key in this terminal window to shut down and clean up the
demo containers.

```

The Docker setup will look something like:
```
chelma@3c22fba4e266 docker_config_traffic_gen % docker ps
CONTAINER ID   IMAGE                                                      COMMAND                  CREATED          STATUS          PORTS                              NAMES
ecd701f2b001   haproxy-primary                                            "/docker-entrypoint.…"   21 seconds ago   Up 20 seconds   0.0.0.0:80->9200/tcp               haproxy-primary
37cba786ff27   haproxy-shadow                                             "/docker-entrypoint.…"   22 seconds ago   Up 21 seconds   0.0.0.0:81->9200/tcp               haproxy-shadow
24764db5e518   docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2   "/tini -- /usr/local…"   36 seconds ago   Up 35 seconds   9300/tcp, 0.0.0.0:9203->9200/tcp   shadow-cluster-node-2
56d4b36aa26d   docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2   "/tini -- /usr/local…"   37 seconds ago   Up 36 seconds   9300/tcp, 0.0.0.0:9202->9200/tcp   shadow-cluster-node-1
c198216efc27   docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2   "/tini -- /usr/local…"   50 seconds ago   Up 50 seconds   9300/tcp, 0.0.0.0:9201->9200/tcp   primary-cluster-node-2
c688ef30cd52   docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2   "/tini -- /usr/local…"   51 seconds ago   Up 50 seconds   0.0.0.0:9200->9200/tcp, 9300/tcp   primary-cluster-node-1
```

#### Step 4 - Test the setup with traffic

Now, you can send traffic to the HAProxy Primary on localhost:80 and see it replicated to the Shadow Cluster.

```
chelma@3c22fba4e266 cluster_traffic_capture % curl -X PUT 'localhost:80/noldor/_doc/1' -H 'Content-Type: application/json' -d'{"name": "Finwe"}'
{"_index":"noldor","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

chelma@3c22fba4e266 cluster_traffic_capture % curl -X GET 'localhost:9203/noldor/_doc/1'
{"_index":"noldor","_type":"_doc","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"name": "Finwe"}}
```


You can also use the OpenSearch Benchmarking tool using the supplied Dockerfile at `./docker_config_traffic_gen/Dockerfile`:
```
cd docker_config_traffic_gen
docker build --tag traffic-gen .
```

```
chelma@3c22fba4e266 docker_config_traffic_gen % docker run --name traffic-gen --add-host host.docker.internal:host-gateway traffic-gen:latest
Running opensearch-benchmark w/ 'geonames' workload...

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


--------------------------------
[INFO] SUCCESS (took 27 seconds)
--------------------------------
Running opensearch-benchmark w/ 'http_logs' workload...

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


--------------------------------
[INFO] SUCCESS (took 26 seconds)
--------------------------------
Running opensearch-benchmark w/ 'nested' workload...

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


--------------------------------
[INFO] SUCCESS (took 14 seconds)
--------------------------------
Running opensearch-benchmark w/ 'nyc_taxis' workload...

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


--------------------------------
[INFO] SUCCESS (took 13 seconds)
--------------------------------
```

The image is pre-configured to send traffic to localhost:80.  You can see the logged traffic on the Primary and Shadow HAProxy in their `/var/log`
```
chelma@3c22fba4e266 ~ % docker exec -it haproxy-shadow bash
root@37cba786ff27:/# tail -n 5 /var/log/haproxy-traffic.log
Feb 10 19:04:15 localhost haproxy[23]: Request-URI: /_stats/_all?level=shards#012Request-Method: GET#012Request-Body: -#012Response-Body: {"_shards":{"total":49,"successful":49,"failed":0},"_all":{"primaries":{"docs":{"count":1,"deleted":0},"store":{"size_in_bytes":13683,"reserved_in_bytes":0},"indexing":{"index_total":1,"index_time_in_millis":13,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":0,"time_in_millis":0,"exists_total":0,"exists_time_in_millis":0,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":0,"query_time_in_millis":0,"query_current":0,"fetch_total":0,"fetch_time_in_millis":0,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_do
Feb 10 19:04:15 localhost haproxy[23]: Request-URI: /.ml-anomalies-*/_search#012Request-Method: POST#012Request-Body: {"size":0,"query":{"bool":{"must":[{"term":{"result_type":"bucket"}}]}},"aggs":{"jobs":{"terms":{"field":"job_id"},"aggs":{"min_pt":{"min":{"field":"processing_time_ms"}},"max_pt":{"max":{"field":"processing_time_ms"}},"mean_pt":{"avg":{"field":"processing_time_ms"}},"median_pt":{"percentiles":{"field":"processing_time_ms","percents":[50]}}}}}}#012Response-Body: {"took":0,"timed_out":false,"_shards":{"total":0,"successful":0,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":0.0,"hits":[]}}
Feb 10 19:04:20 localhost haproxy[23]: Request-URI: /_bulk#012Request-Method: POST#012Request-Body: -#012Response-Body: -
Feb 10 19:04:20 localhost haproxy[23]: message repeated 4 times: [ Request-URI: /_bulk#012Request-Method: POST#012Request-Body: -#012Response-Body: -]
Feb 10 19:04:22 localhost haproxy[23]: Request-URI: /nyc_taxis/_search#012Request-Method: GET#012Request-Body: -#012Response-Body: -
```

#### Step 5 - Clean up the demo setup

As the terminal output suggestion, you can spin down the demo setup and clean up all created resources by hitting RETURN in the original terminal:

```
Stopping cluster primary-cluster...
Cleaning up underlying resources for cluster primary-cluster...
Stopping cluster shadow-cluster...
Cleaning up underlying resources for cluster shadow-cluster...
Cleaning up underlying resources for the Primary HAProxy container...
Cleaning up underlying resources for the Shadow HAProxy container...
(.venv) chelma@3c22fba4e266 cluster_traffic_capture %
```