# Infrastructure monitoring This module provides EKS cluster monitoring with the following resources: - AWS Distro For OpenTelemetry Operator and Collector for Metrics and Traces - Logs with [AWS for FluentBit](https://github.com/aws/aws-for-fluent-bit) - Installs Grafana Operator to add AWS data sources and create Grafana Dashboards to Amazon Managed Grafana. - Installs FluxCD to perform GitOps sync of a Git Repo to EKS Cluster. We will use this later for creating Grafana Dashboards and AWS datasources to Amazon Managed Grafana. - Installs External Secrets Operator to retrieve and Sync the Grafana API keys from AWS SSM Parameter Store. - Amazon Managed Grafana Dashboard and data source - Alerts and recording rules with AWS Managed Service for Prometheus This module makes use of the open source [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) See examples using this Terraform modules in the **Amazon EKS** section of [this documentation](https://aws-observability.github.io/terraform-aws-observability-accelerator/) ## Requirements | Name | Version | |------|---------| | [terraform](#requirement\_terraform) | >= 1.1.0 | | [aws](#requirement\_aws) | >= 5.0.0 | | [helm](#requirement\_helm) | >= 2.4.1 | | [kubectl](#requirement\_kubectl) | >= 1.14 | | [kubernetes](#requirement\_kubernetes) | >= 2.10 | ## Providers | Name | Version | |------|---------| | [aws](#provider\_aws) | >= 5.0.0 | | [helm](#provider\_helm) | >= 2.4.1 | | [kubectl](#provider\_kubectl) | >= 1.14 | ## Modules | Name | Source | Version | |------|--------|---------| | [external\_secrets](#module\_external\_secrets) | ./add-ons/external-secrets | n/a | | [fluentbit\_logs](#module\_fluentbit\_logs) | ./add-ons/aws-for-fluentbit | n/a | | [helm\_addon](#module\_helm\_addon) | github.com/aws-ia/terraform-aws-eks-blueprints//modules/kubernetes-addons/helm-addon | v4.32.0 | | [istio\_monitoring](#module\_istio\_monitoring) | ./patterns/istio | n/a | | [java\_monitoring](#module\_java\_monitoring) | ./patterns/java | n/a | | [nginx\_monitoring](#module\_nginx\_monitoring) | ./patterns/nginx | n/a | | [operator](#module\_operator) | ./add-ons/adot-operator | n/a | ## Resources | Name | Type | |------|------| | [aws_prometheus_rule_group_namespace.alerting_rules](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_rule_group_namespace) | resource | | [aws_prometheus_rule_group_namespace.recording_rules](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_rule_group_namespace) | resource | | [helm_release.fluxcd](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource | | [helm_release.grafana_operator](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource | | [helm_release.kube_state_metrics](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource | | [helm_release.prometheus_node_exporter](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource | | [kubectl_manifest.flux_gitrepository](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource | | [kubectl_manifest.flux_kustomization](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource | | [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source | | [aws_eks_cluster.eks_cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source | | [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source | | [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source | ## Inputs | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| | [adot\_loglevel](#input\_adot\_loglevel) | Verbosity level for ADOT collector logs | `string` | `"warn"` | no | | [custom\_metrics\_config](#input\_custom\_metrics\_config) | Configuration object to enable custom metrics collection |
object({|
ports = list(number)
# paths = optional(list(string), ["/metrics"])
# list of samples to be dropped by label prefix, ex: go_ -> discards go_.*
dropped_series_prefixes = list(string)
})
{| no | | [eks\_cluster\_id](#input\_eks\_cluster\_id) | EKS Cluster Id | `string` | n/a | yes | | [enable\_alerting\_rules](#input\_enable\_alerting\_rules) | Enables or disables Managed Prometheus alerting rules | `bool` | `true` | no | | [enable\_amazon\_eks\_adot](#input\_enable\_amazon\_eks\_adot) | Enables the ADOT Operator on the EKS Cluster | `bool` | `true` | no | | [enable\_cert\_manager](#input\_enable\_cert\_manager) | Allow reusing an existing installation of cert-manager | `bool` | `true` | no | | [enable\_custom\_metrics](#input\_enable\_custom\_metrics) | Allows additional metrics collection for config elements in the `custom_metrics_config` config object. Automatic dashboards are not included | `bool` | `false` | no | | [enable\_dashboards](#input\_enable\_dashboards) | Enables or disables curated dashboards | `bool` | `true` | no | | [enable\_external\_secrets](#input\_enable\_external\_secrets) | Installs External Secrets to EKS Cluster | `bool` | `true` | no | | [enable\_fluxcd](#input\_enable\_fluxcd) | Enables or disables FluxCD. Disabling this might affect some data in the dashboards | `bool` | `true` | no | | [enable\_grafana\_operator](#input\_enable\_grafana\_operator) | Deploys Grafana Operator to EKS Cluster | `bool` | `true` | no | | [enable\_istio](#input\_enable\_istio) | Enable ISTIO workloads monitoring, alerting and default dashboards | `bool` | `false` | no | | [enable\_java](#input\_enable\_java) | Enable Java workloads monitoring, alerting and default dashboards | `bool` | `false` | no | | [enable\_kube\_state\_metrics](#input\_enable\_kube\_state\_metrics) | Enables or disables Kube State metrics exporter. Disabling this might affect some data in the dashboards | `bool` | `true` | no | | [enable\_logs](#input\_enable\_logs) | Using AWS For FluentBit to collect cluster and application logs to Amazon CloudWatch | `bool` | `true` | no | | [enable\_nginx](#input\_enable\_nginx) | Enable NGINX workloads monitoring, alerting and default dashboards | `bool` | `false` | no | | [enable\_node\_exporter](#input\_enable\_node\_exporter) | Enables or disables Node exporter. Disabling this might affect some data in the dashboards | `bool` | `true` | no | | [enable\_recording\_rules](#input\_enable\_recording\_rules) | Enables or disables Managed Prometheus recording rules | `bool` | `true` | no | | [enable\_tracing](#input\_enable\_tracing) | (Experimental) Enables tracing with AWS X-Ray. This changes the deploy mode of the collector to daemon set. Requirement: adot add-on <= 0.58-build.0 | `bool` | `false` | no | | [flux\_config](#input\_flux\_config) | FluxCD configuration |
"dropped_series_prefixes": [
"unspecified"
],
"ports": []
}
object({|
create_namespace = bool
k8s_namespace = string
helm_chart_name = string
helm_chart_version = string
helm_release_name = string
helm_repo_url = string
helm_settings = map(string)
helm_values = map(any)
})
{| no | | [flux\_gitrepository\_branch](#input\_flux\_gitrepository\_branch) | Flux GitRepository Branch | `string` | `"main"` | no | | [flux\_gitrepository\_name](#input\_flux\_gitrepository\_name) | Flux GitRepository name | `string` | `"aws-observability-accelerator"` | no | | [flux\_gitrepository\_url](#input\_flux\_gitrepository\_url) | Flux GitRepository URL | `string` | `"https://github.com/aws-observability/aws-observability-accelerator"` | no | | [flux\_kustomization\_name](#input\_flux\_kustomization\_name) | Flux Kustomization name | `string` | `"grafana-dashboards-infrastructure"` | no | | [flux\_kustomization\_path](#input\_flux\_kustomization\_path) | Flux Kustomization Path | `string` | `"./artifacts/grafana-operator-manifests/eks/infrastructure"` | no | | [go\_config](#input\_go\_config) | Grafana Operator configuration |
"create_namespace": true,
"helm_chart_name": "flux2",
"helm_chart_version": "2.7.0",
"helm_release_name": "observability-fluxcd-addon",
"helm_repo_url": "https://fluxcd-community.github.io/helm-charts",
"helm_settings": {},
"helm_values": {},
"k8s_namespace": "flux-system"
}
object({|
create_namespace = bool
helm_chart = string
helm_name = string
k8s_namespace = string
helm_release_name = string
helm_chart_version = string
})
{| no | | [grafana\_api\_key](#input\_grafana\_api\_key) | Grafana API key for the Amazon Managed Grafana workspace. Required if `enable_external_secrets = true` | `string` | `""` | no | | [grafana\_cluster\_dashboard\_url](#input\_grafana\_cluster\_dashboard\_url) | Dashboard URL for Cluster Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json"` | no | | [grafana\_kubelet\_dashboard\_url](#input\_grafana\_kubelet\_dashboard\_url) | Dashboard URL for Kubelet Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json"` | no | | [grafana\_namespace\_workloads\_dashboard\_url](#input\_grafana\_namespace\_workloads\_dashboard\_url) | Dashboard URL for Namespace Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json"` | no | | [grafana\_node\_exporter\_dashboard\_url](#input\_grafana\_node\_exporter\_dashboard\_url) | Dashboard URL for Node Exporter Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json"` | no | | [grafana\_nodes\_dashboard\_url](#input\_grafana\_nodes\_dashboard\_url) | Dashboard URL for Nodes Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json"` | no | | [grafana\_url](#input\_grafana\_url) | Endpoint URL of Amazon Managed Grafana workspace. Required if `enable_grafana_operator = true` | `string` | `""` | no | | [grafana\_workloads\_dashboard\_url](#input\_grafana\_workloads\_dashboard\_url) | Dashboard URL for Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json"` | no | | [helm\_config](#input\_helm\_config) | Helm Config for Prometheus | `any` | `{}` | no | | [irsa\_iam\_permissions\_boundary](#input\_irsa\_iam\_permissions\_boundary) | IAM permissions boundary for IRSA roles | `string` | `null` | no | | [irsa\_iam\_role\_path](#input\_irsa\_iam\_role\_path) | IAM role path for IRSA roles | `string` | `"/"` | no | | [istio\_config](#input\_istio\_config) | Configuration object for ISTIO monitoring |
"create_namespace": true,
"helm_chart": "oci://ghcr.io/grafana-operator/helm-charts/grafana-operator",
"helm_chart_version": "v5.0.0-rc3",
"helm_name": "grafana-operator",
"helm_release_name": "grafana-operator",
"k8s_namespace": "grafana-operator"
}
object({| `null` | no | | [java\_config](#input\_java\_config) | Configuration object for Java/JMX monitoring |
enable_alerting_rules = bool
enable_recording_rules = bool
enable_dashboards = bool
scrape_sample_limit = number
flux_gitrepository_name = string
flux_gitrepository_url = string
flux_gitrepository_branch = string
flux_kustomization_name = string
flux_kustomization_path = string
grafana_url = string
grafana_istio_cp_dashboard_url = string
grafana_istio_mesh_dashboard_url = string
grafana_istio_performance_dashboard_url = string
grafana_istio_service_dashboard_url = string
prometheus_metrics_endpoint = string
})
object({| `null` | no | | [ksm\_config](#input\_ksm\_config) | Kube State metrics configuration |
enable_alerting_rules = bool
enable_recording_rules = bool
enable_dashboards = bool
scrape_sample_limit = number
flux_gitrepository_name = string
flux_gitrepository_url = string
flux_gitrepository_branch = string
flux_kustomization_name = string
flux_kustomization_path = string
grafana_dashboard_url = string
prometheus_metrics_endpoint = string
})
object({|
create_namespace = bool
k8s_namespace = string
helm_chart_name = string
helm_chart_version = string
helm_release_name = string
helm_repo_url = string
helm_settings = map(string)
helm_values = map(any)
scrape_interval = string
scrape_timeout = string
})
{| no | | [logs\_config](#input\_logs\_config) | Configuration object for logs collection |
"create_namespace": true,
"helm_chart_name": "kube-state-metrics",
"helm_chart_version": "4.24.0",
"helm_release_name": "kube-state-metrics",
"helm_repo_url": "https://prometheus-community.github.io/helm-charts",
"helm_settings": {},
"helm_values": {},
"k8s_namespace": "kube-system",
"scrape_interval": "60s",
"scrape_timeout": "15s"
}
object({|
cw_log_retention_days = number
})
{| no | | [managed\_prometheus\_workspace\_endpoint](#input\_managed\_prometheus\_workspace\_endpoint) | Amazon Managed Prometheus Workspace Endpoint | `string` | `""` | no | | [managed\_prometheus\_workspace\_id](#input\_managed\_prometheus\_workspace\_id) | Amazon Managed Prometheus Workspace ID | `string` | `null` | no | | [managed\_prometheus\_workspace\_region](#input\_managed\_prometheus\_workspace\_region) | Amazon Managed Prometheus Workspace's Region | `string` | `null` | no | | [ne\_config](#input\_ne\_config) | Node exporter configuration |
"cw_log_retention_days": 90
}
object({|
create_namespace = bool
k8s_namespace = string
helm_chart_name = string
helm_chart_version = string
helm_release_name = string
helm_repo_url = string
helm_settings = map(string)
helm_values = map(any)
scrape_interval = string
scrape_timeout = string
})
{| no | | [nginx\_config](#input\_nginx\_config) | Configuration object for NGINX monitoring |
"create_namespace": true,
"helm_chart_name": "prometheus-node-exporter",
"helm_chart_version": "4.14.0",
"helm_release_name": "prometheus-node-exporter",
"helm_repo_url": "https://prometheus-community.github.io/helm-charts",
"helm_settings": {},
"helm_values": {},
"k8s_namespace": "prometheus-node-exporter",
"scrape_interval": "60s",
"scrape_timeout": "60s"
}
object({| `null` | no | | [prometheus\_config](#input\_prometheus\_config) | Controls default values such as scrape interval, timeouts and ports globally |
enable_alerting_rules = bool
enable_recording_rules = bool
enable_dashboards = bool
scrape_sample_limit = number
flux_gitrepository_name = string
flux_gitrepository_url = string
flux_gitrepository_branch = string
flux_kustomization_name = string
flux_kustomization_path = string
grafana_dashboard_url = string
prometheus_metrics_endpoint = string
})
object({|
global_scrape_interval = string
global_scrape_timeout = string
})
{| no | | [tags](#input\_tags) | Additional tags (e.g. `map('BusinessUnit`,`XYZ`) | `map(string)` | `{}` | no | | [target\_secret\_name](#input\_target\_secret\_name) | Target secret in Kubernetes to store the Grafana API Key Secret | `string` | `"grafana-admin-credentials"` | no | | [target\_secret\_namespace](#input\_target\_secret\_namespace) | Target namespace of secret in Kubernetes to store the Grafana API Key Secret | `string` | `"grafana-operator"` | no | | [tracing\_config](#input\_tracing\_config) | Configuration object for traces collection to AWS X-Ray |
"global_scrape_interval": "120s",
"global_scrape_timeout": "15s"
}
object({|
otlp_grpc_endpoint = string
otlp_http_endpoint = string
send_batch_size = number
timeout = string
})
{| no | ## Outputs | Name | Description | |------|-------------| | [eks\_cluster\_id](#output\_eks\_cluster\_id) | EKS Cluster Id | | [eks\_cluster\_version](#output\_eks\_cluster\_version) | EKS Cluster version |
"otlp_grpc_endpoint": "0.0.0.0:4317",
"otlp_http_endpoint": "0.0.0.0:4318",
"send_batch_size": 50,
"timeout": "30s"
}