# SDK Metrics System ## Concepts ### Metric * A measure of some aspect of the SDK. Examples include request latency, number of pooled connections and retries executed. * A metric is associated to a category. Some of the metric categories are `Default`, `HttpClient` and `Streaming`. This enables customers to enable metrics only for categories they are interested in. Refer to the [Metrics List](./MetricsList.md) document for a complete list of standard metrics collected by the SDK. ### Metric Collector * `MetricCollector` is a typesafe aggregator of metrics. This is the primary interface through which other SDK components report metrics they emit, using the `reportMetric(SdkMetric,Object)` method. * `MetricCollector` objects allow for nesting. This enables metrics to be collected in the context of other metric events. For example, for a single API call, there may be multiple request attempts if there are retries. Each attempt's associated metric events can be stored in their own `MetricCollector`, all of which are children of another collector that represents metrics for the entire API call. A child of a collector is created by calling its `childCollector(String)` method. * The `collect()` method returns a `MetricCollection`. This class essentially returns an immutable version of the tree formed by the collector and its children, which are also represented by `MetricCollection` objects. Note that calling `collect()` implies that child collectors are also collected. * Each collector has a name. Often this is will be used to describe the class of metrics that it collects; e.g. `"ApiCall"` and `"ApiCallAttempt"`. * [Interface prototype](prototype/MetricCollector.java) ### MetricPublisher * A `MetricPublisher` publishes collected metrics to a system(s) outside of the SDK. It takes a `MetricCollection` object, potentially transforms the data into richer metrics, and also into a format the receiver expects. * By default, the SDK will provide implementations to publish metrics to [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) and [Client Side Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK Metrics for Enterprise Support). * Metrics publishers are pluggable within the SDK, allowing customers to provide their own custom implementations. * Metric publishers can have different behaviors in terms of list of metrics to publish, publishing frequency, configuration needed to publish etc. * [Interface prototype](prototype/MetricPublisher.java) ## Enabling Metrics The metrics feature is disabled by default. Metrics can be enabled and configured in the following ways: ### Option 1: Configuring MetricPublishers on a request A publisher can be configured directly on the `RequestOverrideConfiguration`: ```java MetricPublisher metricPublisher = CloudWatchMetricPublisher.create(); DynamoDbClient dynamoDb = DynamoDbClient.create(); dynamoDb.listTables(ListTablesRequest.builder() .overrideConfiguration(c -> c.addMetricPublisher(metricPublisher)) .build()); ``` The methods exposed for setting metric publishers follow the pattern established by `ExecutionInterceptor`s: ```java class RequestOverrideConfiguration { // ... class Builder { // ... Builder metricPublishers(List metricsPublishers); Builder addMetricPublisher(MetricPublisher metricsPublisher); } } ``` ### Option 2: Configuring MetricPublishers on a client A publisher can be configured directly on the `ClientOverrideConfiguration`. A publisher specified in this way is used with lower priority than **Option 1** above. ```java MetricPublisher metricPublisher = CloudWatchMetricPublisher.create(); DynamoDbClient dynamoDb = DynamoDbClient.builder() .overrideConfiguration(c -> c.addMetricPublisher(metricPublisher)) .build(); ``` The methods exposed for setting metric publishers follow the pattern established by `ExecutionInterceptor`s: ```java class ClientOverrideConfiguration { // ... class Builder { // ... Builder metricPublishers(List metricsPublishers); Builder addMetricPublisher(MetricPublisher metricsPublisher); } } ``` **Note:** As with the `httpClient` setting, calling `close()` on the `DynamoDbClient` *will not* close the configured `metricPublishers`. You must close the `metricPublishers` yourself when you're done using them. ### Option 3: Configuring MetricPublishers using System Properties or Environment Variables This option allows the customer to enable metric publishing by default, without needing to enable it via **Option 1** or **Option 2** above. This means that a customer can enable metrics without needing to make a change to their runtime code. This option is enabled using an environment variable or system property. If both are specified, the system property will be used. If metrics are enabled at the client level using **Option 2** above, this option is ignored. Overriding the metric publisher at request time using **Option 1** overrides any publishers that have been enabled globally. **System Property:** `aws.metricPublishingEnabled=true` **Environment Variable:** `AWS_METRIC_PUBLISHING_ENABLED=true` The value specified must be one of `"true"` or `"false"`. Specifying any other string values will result in a value of `"false"` being used, and a warning being logged each time an SDK client is created. When the value is `"false"`, no metrics will be published by a client. When the value is `"true"`, metrics will be published by every client to a set of "global metric publishers". The set of global metric publishers is loaded automatically using the same mechanism currently used to discover HTTP clients. This means that including the `cloudwatch-metric-publisher` module and enabling the system property or environment variable above is sufficient to enable metric publishing to CloudWatch on all AWS clients. The set of "Global Metric Publishers" is static and is used for *all* AWS SDK clients instantiated by the application (while **Option 3** remains enabled). A JVM shutdown hook will be registered to invoke `MetricPublisher.close()` on every publisher (in case the publishers use non-daemon threads that would otherwise block JVM shutdown). #### Updating a MetricPublisher to work as a global metric publisher **Option 3** above references the concept of "Global Metric Publishers", which are a set of publishers that are discovered automatically by the SDK. This section outlines how global metric publishers are discovered and created. Each `MetricPublisher` that supports loading when **Option 3** is enabled must: 1. Provide an `SdkMetricPublisherService` implementation. An `SdkMetricPublisherService` implementation is a class with a zero-arg constructor, used to instantiate a specific type of `MetricPublisher` (e.g. a `CloudWatchMetricPublisherService` that is a factory for `CloudWatchMetricPublisher`s). 2. Provide a resource file: `META-INF/services/software.amazon.awssdk.metrics.SdkMetricPublisherService`. This file contains the list of fully-qualified `SdkMetricPublisherService` implementation class names. The `software.amazon.awssdk.metrics.SdkMetricPublisherService` interface that must be implemented by all global metric publisher candidates is defined as: ```java public interface SdkMetricPublisherService { MetricPublisher createMetricPublisher(); } ``` **`SdkMetricPublisherService` Example** Enabling the `CloudWatchMetricPublisher` as a global metric publisher can be done by implementing the `SdkMetricPublisherService` interface: ```java package software.amazon.awssdk.metrics.publishers.cloudwatch; public final class CloudWatchSdkMetricPublisherService implements SdkMetricPublisherService { @Override public MetricPublisher createMetricPublisher() { return CloudWatchMetricPublisher.create(); } } ``` And creating a `META-INF/services/software.amazon.awssdk.metrics.SdkMetricPublisherService` resource file in the `cloudwatch-metric-publisher` module with the following contents: ``` software.amazon.awssdk.metrics.publishers.cloudwatch.CloudWatchSdkMetricPublisherService ``` #### Option 3 Implementation Details and Edge Cases **How the SDK loads `MetricPublisher`s when Option 3 is enabled** When a client is created with **Option 3** enabled (and **Option 2** "not specified"), the client retrieves the list of global metric publishers to use via a static "global metric publisher list" singleton. This singleton is initialized exactly once using the following process: 1. The singleton uses `java.util.ServiceLoader` to locate all `SdkMetricPublisherService` implementations configured as described above. The classloader used with the service loader is chosen in the same manner as the one chosen for the HTTP client service loader (`software.amazon.awssdk.core.internal.http.loader.SdkServiceLoader`). That is, the first classloader present in the following list: (1) the classloader that loaded the SDK, (2) the current thread's classloader, then (3) the system classloader. 2. The singleton creates an instance of every `SdkMetricPublisherService` located in this manner. 3. The singleton creates an instance of each `MetricPublisher` instance using the metrics publisher services. **How Option 3 and Option 1 behave when Option 2 is "not specified"** The SDK treats **Option 3** as the default set of client-level metric publishers to be used when **Option 2** is "not specified". This means that if a customer: (1) enables global metric publishing using **Option 3**, (2) does not specify client-level publishers using **Option 2**, and (3) specifies metric publishers at the request level with **Option 1**, then the global metric publishers are still *instantiated* but will not be used. This nuance prevents the SDK from needing to consult the global metric configuration with every request. **How Option 2 is considered "not specified" for the purposes of considering Option 3** Global metric publishers (**Option 3**) are only considered for use when **Option 2** is "not specified". "Not specified" is defined to be when the customer either: (1) does not invoke `ClientOverrideConfiguration.Builder.addMetricPublisher()` / `ClientOverrideConfiguration.Builder.metricPublishers()`, or (2) invokes `ClientOverrideConfiguration.Builder.metricPublishers(null)` as the last `metricPublisher`-mutating action on the client override configuration builder. This definition purposefully excludes `ClientOverrideConfiguration.Builder.metricPublishers(emptyList())`. Setting the `metricPublishers` to an empty list is equivalent to setting the `metricPublishers` to the `NoOpMetricPublisher`. **Implementing an SdkMetricPublisherService that depends on an AWS clients** Any `MetricPublisher`s that supports creation via a `SdkMetricPublisherService` and depends on an AWS service client **must** disable metric publishing on those AWS service clients using **Option 2** when they are created via the `SdkMetricPublisherService`. This is to prevent a scenario where the global metric publisher singleton's initialization process depends on the global metric publishers singleton already being initialized. ## Modules New modules are created to support metrics feature. ### metrics-spi * Contains the metrics interfaces and default implementations that don't require other dependencies * This is a sub module under `core` * `sdk-core` has a dependency on `metrics-spi`, so customers will automatically get a dependency on this module. ### metrics-publishers * This is a new module that contains implementations of all SDK supported publishers * Under this module, a new sub-module is created for each publisher (`cloudwatch-publisher`, `csm-publisher`) * Customers have to **explicitly add dependency** on these modules to use the sdk provided publishers ## Performance One of the main tenets for metrics is "Enabling default metrics should have minimal impact on the application performance". The following design choices are made to ensure enabling metrics does not affect performance significantly. * When collecting metrics, a No-op metric collector is used if metrics are disabled. All methods in this collector are no-op and return immediately. * Metric publisher implementations can involve network calls and impact latency if done in blocking way. Therefore, all SDK publisher implementations will process the metrics asynchronously to not block the request thread. * Performance tests will be written and run with each release to ensure that the SDK performs well even when metrics are enabled and being collected and published.