--- title: "1. Find the Root Cause" chapter: true weight: 8 --- ## Find the Root Cause The TravelLogic app that you support faces unusual latency in loading the app web page, also sometimes resulting in outage in the us-east-1. You need to find out the root cause of the latency and outages as soon as possible. 1. In the Sumo Logic UI, near the top of the Sumo Logic UI, click **+New** > **Explore**. ![Explore](/images/lab-monitoring/lab1-explore.png) Alternatively, you can click **+New**, and select **Root Cause** to access the Root Cause Explorer, if you are comfortable with applying the filters to see your events of interest. You can continue with **Step 7** directly. 1. In the top left corner of the screen that displays, click under **Explore by**, and select **AWS Observability**. 1. In the left pane, click the **us-east-1** region under the **Prod** account. ![Prod Account](/images/lab-monitoring/lab1-prod.png) 1. In the right pane, in the Dashboards dropdown, select **AWS Region Events of Interest**. For more information on the RCE and Dashboards linked, refer to our [Root Cause Explorer](https://help.sumologic.com/Observability_Solution/Root_Cause_Explorer) documentation. ![Region Events of Interest](/images/lab-monitoring/lab1-regionevent-interest.png) 1. To open the Events of Interest (EOI) dashboard in the Root Cause Explorer (RCE), click the three vertical dots on the right of the EoI screen. ![Root Cause Explorer](/images/lab-monitoring/lab1-open-rce.png) The EOI opens in RCE. ![EOI in RCE](/images/lab-monitoring/lab1-eoi-rce.png) 1. You can see the most recent outage in the EC2 ALB with 5XX error plotted on the screen. When you hover over the circles, you can see the EOI stats. In the stats, you can see that one of the reading says y% for x min. It indicates that the value of the metric shows **y** percent drift from the expected value for **x** minutes. The extent of drift from the expected value of a metric are classified as High, Medium or Low. The high intensity EOIs require more attention than others. ![EOI Error](/images/lab-monitoring/lab1-eoi-error.png) 1. However, if you hover your mouse slightly above 5XX error, you can see the EC2 CPU Log bottleneck event is plotted slightly earlier, and when you move further up, you’ll notice the DynamoDB throttle event plotted earlier than the CPU Load bottleneck. ![EOI Error](/images/lab-monitoring/lab1-eoi-error.png) ![EOI Bottleneck](/images/lab-monitoring/lab1-eoi-bottleneck.png) 1. You can select the time range and the scatter plot will magnify. Use the magnifying glass to zoom out. 1. When you put these relevant events in timeline perspective, you realise that mostly the DynamoDB got throttled first and rest of the events occurred as the cascading effects of the database throttling. ![Find DynamoDB Throttle](/images/lab-monitoring/lab1-find-throttle.png) 1. Click the **DynamoDB** event plot. On the right side of the screen, in the Entity Inspector panel, check the details that the **Summary** tab displays. You can see a spike in the metrics chart plotted below in the Summary tab. ![Event Summary](/images/lab-monitoring/lab1-rce-summary.png) 1. Click the **Entities** tab, where you can see the relevant entities and environment parameters listed. ![Entities tab](/images/lab-monitoring/lab1-rce-entities.png) 1. Click the **Open In** button in the Entity section, and select **Entity Dashboard**, to open the relevant entity dashboard. ![Entity Dashboard](/images/lab-monitoring/lab1-entity-dashboard.png) 1. Open the relevant DynamoDB Events dashboard by clicking the **AWS DynamoDB - Events** option in the dropdown The data in the dashboard is fetched using the relevant metrics. ![DynamoDB Events](/images/lab-monitoring/lab1-dynamodb-events.png) 1. The dashboard displays a number of panels including No. of Errors, Top 5 Events, and Events over Time. You can see the number of errors and the events that occurred on the dashboard including Delete Table, Describe Table, and Update Table. ![DynamoDB Events](/images/lab-monitoring/lab1-dynamodb-events2.png) 1. When you open the relevant metrics in the events dashboard, you need to further drill down the associated logs. Scroll down to see the Top Errors, and click Open in Search to open the relevant logs. ![Top Errors](/images/lab-monitoring/lab1-dynamodb-toperrors.png) 1. When you open the events in Search mode, you see the **aggregate** events, where you see someone has updated the table. Click **Messages** to drill down the logs. ![Aggregate Events](/images/lab-monitoring/lab1-aggregate-events.png) 1. All the relevant log entries are displayed. If required, you can apply the filter in the left pane to focus on the relevant fields and search through the event you need to point out. In this case, deselect the field ‘**AttributeDefinition**’. ![Filter Events](/images/lab-monitoring/lab1-filter-events.png) 1. When you go through the update table log events, you realize that someone named **Mike Man** changed the read/write parameter (IOPS). ![Parameter Change Event](/images/lab-monitoring/lab1-event-readwrite.png) The developer has reconfigured DynamoDB to use lower-provisioned IOPS (Input/Output Operations Per Second) which caused throttling and the subsequent outage. Most probably the cost optimization could have been a motivation for the developer to make the change, as AWS charges for DynamoDB based on provisioned Read/Write Capacity Units. When you reconfigure the IOPS, the issue should be resolved. So here, you find that the real root cause for DynamoDB throttling is a change in the Provisioned IOPS setting of a table. Lowering this setting, while lowering AWS costs, can also lead to throttling. Such a configuration change might be evident in AWS CloudTrail logs associated with DynamoDB. So, updating the IOPS value to the allowed range will solve the issue. {{% notice note %}} The developer ‘**Mike Man**’ who configured the DynamoDB is just an example. The developer name and the IOPS changes may vary when you perform the lab exercise. {{% /notice %}}