# ho11y Hello and welcome to `ho11y` (pronounced: howl-y), a synthetic signal generator allowing you to test observability solutions for microservices. It emits logs, metrics, and traces in a configurable manner. Contents: * [Overview](#overview) * [Signals](#signals) * [Configuration](#configuration) ---- ## Overview `ho11y` emits and exposes signals as follows: * Logs in JSON format, using [Logrus](https://github.com/sirupsen/logrus). * Metrics using the [Prometheus Go client library](https://github.com/prometheus/client_golang). * Traces using OpenTelemetry, with the [ADOT Go SDK](https://aws-otel.github.io/docs/getting-started/go-sdk). In a nutshell, this is what you get with `ho11y`: invoke at `/` and optionally up to five downstreams that are invoked: ``` main service path at '/' :8765 | | +-------+-------+ | | :55680----+ +----:8765 | h o 1 1 y | OTLP interface | | OM exposition at '/metrics' | | +-+--+--+--+--+-+ | | | | | | | | | | v v v v v D0 D1 D2 D3 D4 downstreams ``` To use `ho11y`, build the binary with `go build .` and launch it then as follows: ``` $ ./ho11y {"level":"info","msg":"Using OTLP endpoint: 0.0.0.0:55680","time":"2021-04-26T18:07:31+01:00"} {"level":"info","msg":"Init instrumentation done","time":"2021-04-26T18:07:31+01:00"} {"level":"info","msg":"Launching ho11y, listening on :8765","time":"2021-04-26T18:07:31+01:00"} {"event":"invoke","level":"info","msg":"ho11y was invoked by [::1]:62421","time":"2021-04-26T18:07:43+01:00"} ``` With `ho11y` running, you can now invoke it as follows: ``` $ curl http://localhost:8765/ {"traceId":"1-6086f35f-fdb81d0f3235709c09a86978"} ``` Every time you hit the root URL, that's path (`/`), you trigger signals as described below in greater detail. Note that logs and traces are pushed and metrics are pulled (or: scraped). You can also run `ho11y` as a container: ``` docker build . -t ho11y:stable docker run --name ho11y --rm -p 8765:8765 ho11y:stable ``` ## Signals Currently, we support the three major signal types: logs, metrics, and traces. ### Logs Whenever `ho11y` is invoked, a log message of the following form is written to `stdout`: ```json { "event": "invoke", "level": "info", "msg": "ho11y was invoked by [::1]:56993", "time": "2021-04-26T12:12:01+01:00" } ``` ### Metrics We expose a number of metrics in `ho11y` via the `/metrics` enpoint, making it convenient to scrape it from Prometheus. There are two types of metrics: 1. Request-based: `ho11y_total`, `ho11y_downstream_payload_bytes`, and `ho11y_downstream_duration_seconds`, which you can directly influence by invoking the service. 1. Random: `ho11y_randval` and `ho11y_randhist`, which are automatically filled. Let's have a closer look at all of them now and show how you can consume them with [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/). If you're not familiar with Prometheus, consider checking out a quick [introduction to it](https://github.com/yolossn/Prometheus-Basics). #### Request-based metrics To see the overall number of invokes, that is, the metric `ho11y_total` of type counter, use the following PromQL statement: ``` rate(ho11y_total[1m]) ``` In addition, we have instrumented `ho11y` with two further metrics that provide information about the downstreams invocations: `ho11y_downstream_payload_bytes` is a summary that captures the HTTP body size received from a downstream and `ho11y_downstream_duration_seconds` is a histogram that represents the duration taken for invoking a downstream. You can use the following PromQL statements for those: ``` # average of the downstream payloads over the past 10 minutes: sum(rate(ho11y_downstream_payload_bytes_sum[10m])) / sum(rate(ho11y_downstream_payload_bytes_count[10m])) # histogram of downstream invoke durations: ho11y_downstream_duration_seconds_bucket ``` #### Random metrics To see a random value, that is, the metric `ho11y_randval` of type gauge, use the following PromQL statement: ``` ho11y_randval ``` To see a random histogram, that is, the metric `ho11y_randhist` of type histogram, use the following PromQL statements: ``` rate(ho11y_randhist_sum[1m]) / rate(ho11y_randhist_count[1m]) ``` As well as: ``` histogram_quantile(0.8, sum(rate(ho11y_randhist_bucket[1m])) by (le)) ``` ### Traces In `ho11y` we're using the [ADOT collector](https://aws-otel.github.io/docs/getting-started/collector) with the X-Ray exporter defaults. In order to use tracing in a local setup, you need to run the ADOT collector somewhere, for example, using Docker: ``` docker run -d -p 55680:55680 \ --rm --name adot-collector \ -e AWS_REGION="eu-west-1" \ -e AWS_ACCESS_KEY_ID="XXXXXXXXXXXXXXXX" \ -e AWS_SECRET_ACCESS_KEY="XXXXXXXXXXXXXXXX" \ public.ecr.aws/aws-observability/aws-otel-collector:latest ``` Note to replace the credentials `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` with your own. On invoke, you will find traces akin to the following emitted: ```json { "Id": "1-6086f9bc-7552c97e8b3ef57f25f94346", "Duration": 0.001, "LimitExceeded": false, "Segments": [ { "Id": "9fa75c93b73f980b", "Document": { "id": "9fa75c93b73f980b", "name": "ho11y-svc", "start_time": 1619458492.3810492, "trace_id": "1-6086f9bc-7552c97e8b3ef57f25f94346", "end_time": 1619458492.382217, "fault": false, "error": false, "http": { "request": { "url": "http://192.168.178.23:8765/metrics", "method": "GET", "user_agent": "Prometheus/2.21.0", "client_ip": "192.168.178.23" }, "response": { "status": 200, "content_length": 0 } }, "aws": { "xray": { "auto_instrumentation": false, "sdk_version": "0.18.0", "sdk": "opentelemetry for go" } }, "metadata": { "default": { "otel.resource.telemetry.sdk.name": "opentelemetry", "net.transport": "IP.TCP", "http.flavor": "1.1", "http.route": "/metrics", "net.host.port": "", "otel.resource.host.name": "xxx.amazon.com", "otel.resource.service.name": "ho11y-svc", "otel.resource.telemetry.sdk.language": "go", "net.host.ip": "192.168.178.23", "otel.resource.telemetry.sdk.version": "0.18.0" } } } } ] } ``` OK, enough of the signals that `ho11y` emits, let's move on how you can adapt it to your needs. ## Configuration In the following we review the `ho11y` configuration options, all of them have to be provided via environment variables. This is mainly to make the configuration straightforward in container orchestrators such as Kubernetes, using YAML or the like. ### Failure injection To simulate dropped requests, you can enable failure injection. Set the `HO11Y_INJECT_FAILURE` to a value (it does not matter which value you provide, it's a boolean flag for now, everything counts as long as it's not the empty string) and `ho11y` will drop, in average, half of the requests. This means, half of the time returning an `200` HTTP status code and the other half returning one of the 3x, 4x, or 5x HTTP status codes. For example: ``` $ HO11Y_INJECT_FAILURE=enabled $ ./ho11y {"level":"info","msg":"Failure injection enabled, dropping half of the requests","time":"2021-04-28T11:31:38+01:00"} {"level":"info","msg":"Using OTLP endpoint: 0.0.0.0:55680","time":"2021-04-28T11:31:38+01:00"} {"level":"info","msg":"Init instrumentation done","time":"2021-04-28T11:31:38+01:00"} {"level":"info","msg":"Launching ho11y, listening on :8765","time":"2021-04-28T11:31:38+01:00"} ... ``` ### Downstreams To simulate the invocation of other services, use the `DOWNSTREAMn` environment variable, with n between `0` and `4`. The value must be a reachable HTTP URL. For example, to make `ho11y` call two other `ho11y` instances you wou use: ``` $ DOWNSTREAM0=http://localhost:9990 $ DOWNSTREAM1=http://localhost:9991 $ ./ho11y {"level":"info","msg":"Using downstream 0: http://localhost:9990","time":"2021-04-26T18:07:31+01:00"} {"level":"info","msg":"Using downstream 1: http://localhost:9991","time":"2021-04-26T18:07:31+01:00"} {"level":"info","msg":"Using OTLP endpoint: 0.0.0.0:55680","time":"2021-04-26T18:07:31+01:00"} {"level":"info","msg":"Init instrumentation done","time":"2021-04-26T18:07:31+01:00"} {"level":"info","msg":"Launching ho11y, listening on :8765","time":"2021-04-26T18:07:31+01:00"} {"event":"invoke","level":"info","msg":"ho11y was invoked by [::1]:62421","time":"2021-04-26T18:07:43+01:00"} ``` Further, in order to simulate a downstream with a deterministic response time and size (rather than actually calling it) you can use the `DUMMY` type. The format fur dummy downstreams is `DUMMY:$BODY_SIZE:$INVOKE_DURATION`, that is for example if you wanted to simulate a downstream that returned a 187kB payload in 42ms you would use: ``` DOWNSTREAM2=DUMMY:187kB:42ms ``` Note that the dummy in fact takes as long to return as you specify it. This means that not only the metrics values are used but also the execution is delayed by the time duration you specify. ### Service port To define on which port `ho11y` should listen, use the `HO11Y_PORT` environment variable. Defaults to `8765`. ### OpenTelemetry settings * `OTEL_EXPORTER_OTLP_ENDPOINT` ... IP and port of the ADOT collector, defaults to `0.0.0.0:55680` * `OTEL_RESOURCE_ATTRIB` ... service name, defaults to `ho11y-svc` Note that above configures the ADOT collector connection and defines what you will see in the X-Ray service map. #### Throttling Using the `HO11Y_CUTOFF_TPS` you can set a cutoff point beyond which `ho11y` will return a 429 HTTP status response code. If this environment variable is not set, then no throttling takes place. For example: ``` $ HO11Y_CUTOFF_TPS=1 $ ./ho11y {"level":"info","msg":"Throttling enabled with a 1 TPS cutoff point","time":"2021-06-10T09:24:44+01:00"} {"level":"info","msg":"Using Otel collector at 0.0.0.0:55680","time":"2021-06-10T09:24:44+01:00"} {"level":"info","msg":"Init instrumentation done","time":"2021-06-10T09:24:44+01:00"} {"level":"info","msg":"Launching ho11y: I am [ho11y-svc] listening on port :8765 on all local IPs.","time":"2021-06-10T09:24:44+01:00"} ... {"level":"info","msg":"Throttle engaged, got 2 TPS with a 1 TPS cutoff point","time":"2021-06-10T09:25:37+01:00"} ``` Now, if you `curl` more often than once per second, you should see (also note the corresponding log line, above): ``` $ curl -s -o /dev/null -w "%{http_code}" http://localhost:8765 429 ``` Many front-ends such as AWS X-Ray, treat the 429 HTTP status code [in a special way][x-ray-throttle], for example by highlighting it in a different color. [x-ray-throttle]: https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-errors