# Naming Convention This document will describe the index naming standard for ingestion of Observability signals - Traces, Metrics, Logs. Currently, there is no single coherent pattern to use for all Observability signals and potential data sources. For example - `data-prepper` use their own index naming and structure to ingest Observability signals. `data-prepper Indices:` - Traces data: `otel-v1-apm-span-**` *(Observability Trace mapping)* - Supplement: `otel-v1-apm-service-map` *(Proprietary Index Mapping)* The same goes for jaeger trace data type: - Traces data: `jaeger-span*` *(Observability Trace mapping)* This convention is also harder to manage regarding the index revolving for lifecycle management - this would be optimized using the `data_stream` layer supported by OpenSearch API. Today due to different index structure and non-standard naming patterns we cant create crosscutting queries that will correlate or aggregate information on top of different Observability data providers. ## Proposal We would use the next structure and naming patterns based on the following conventions : 1) Add `data_stream` support for all Observability based standard indices 2) Use a standard Observability signals naming index conventions 3) Create customer namespace naming degree of freedom to allow arbitrary names for specific customer use-cases 4) Move the Observability Indices Template & default index creation into Observability Plugin bootstrap --- 1) Using the `data_stream` will encourage simple physical index management and query - each Observability index would actually be a data_stream: ``` A typical workflow to manage time-series data involves multiple steps, such as creating a rollover index alias, defining a write index, and defining common mappings and settings for the backing indices. Data streams simplify this process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data and ensuring that each document has a timestamp field. A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index ``` 2) Consolidating data using the `data_stream` concepts patterns and catalog. The next Observability index pattern will be followed: Index pattern will follow the next naming structure `{type}`-`{dataset}`-`{namespace}` - **type** - indicated the observability high level types "logs", "metrics", "traces" (prefixed by the `sso_` schema convention ) - **dataset** - The field can contain anything that classify the source of the data - such as `nginx.access` (If none specified "**default** " will be used). - **namespace** - A user defined namespace. Mainly useful to allow grouping of data such as production grade, geography classification 3) The ***sso_{type}-{dataset}-{namespace}*** Pattern address the capability of differentiation of similar information structure to different indices accordingly to customer strategy. This strategy will be defined by the two degrees of naming freedom: `dataset` and `namespace` For example a customer may want to route the nginx logs from two geographical areas into two different indices: - `sso_logs-nginx-us` - `sso_logs-nginx-eu` This type of distinction also allows for creation of crosscutting queries by setting the next **index query pattern** `sso_logs-nginx-*` or by using a geographic based crosscutting query `sso_logs-*-eu`. ## Data index routing The [ingestion component](https://github.com/opensearch-project/data-prepper) which is responsible for ingesting the Observability signals should route the data into the relevant indices. The `sso_{type}-{dataset}-{namespace}` combination dictates the target index, `{type}` is prefixed with the `sso_` prefix into one of the supported type: - Traces - `sso_traces` - Metrics - `sso_metrics` - Logs - `sso_logs` For example if within the ingested log contains the following section: ```json { ... "attributes": { "data_stream": { "type": "span", "dataset": "mysql", "namespace": "prod" } } } ``` This indicates that the target index for this observability signal should be `sso_traces`-`mysql`-`prod` index that follows uses the traces schema mapping. If the `data_stream` information if not present inside the signal, the default index should be used. --- ## Observability Index templates With the expectation of multiple Observability data providers and the need to consolidate all to a single common schema - the Observability plugin will take the following responsibilities : - Define and create all the signals index templates upon loading - Create default data_stream for each signal type upon explicit request - **_this is not done eagerly since the customer may want to change some template index settings_** before generating the default indices - Publish a versioned schema file (Json Schema) for each signal type for general validation usage by any 3rd party ### Note It is important to mention here that these new capabilities would not change or prevent existing customer usage of the system and continue to allow proprietary usage. ### In details *Logs Schema* Default Generated index pattern name: [*logs-default-namespace*](https://github.com/opensearch-project/observability/pull/1403) *Traces Schema* Default Generated index pattern name: [*traces-default-namespace*](https://github.com/opensearch-project/observability/pull/1395) *Metrics Schema* Default Generated index pattern name: [*metrics-default-namespace*](https://github.com/opensearch-project/observability/pull/1397) --- **What alternatives have you considered?** A clear and concise description of any alternative solutions or features you've considered. ## Note Important to mention here that this new suggestion would not change or prevent existing customer usage of the system and continue to allow proprietary usage. **Do you have any additional context?** - [data-streams](https://opensearch.org/docs/latest/opensearch/data-streams/) - [data-prepper](https://github.com/opensearch-project/data-prepper)