# Amazon Personalize Monitor - Core Monitor Function The [personalize_monitor.py](./personalize_monitor.py) Lambda is called every 5 minutes by a CloudWatch scheduled event rule to generate the CloudWatch metrics needed to populate the Personalize Monitor dashboard line graph widgets and to trigger the CloudWatch alarms for low recommender/campaign utilization and idle recommender/campaign detection (if configured). Also, if the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes` AND a monitored campaign has been idle more than `IdleThresholdHours` hours, this function will publish a `DeletePersonalizeCampaign` event to EventBridge that is handled by the [personalize_delete_campaign](../personalize_delete_campaign_function/) function. An idle campaign is one that has not had any `GetRecommendations` or `GetPersonalizedRanking` calls in the last `IdleThresholdHours` hours. Finally, this function will adjust a campaign's `minProvisionedTPS` (down only) if the `AutoAdjustMinTPS` deployment parameter is `Yes`. ## How it works The function first determines what Personalize campaigns should be monitored based on the CloudFormation template parameters you specify when you [install](../README.md#installing-the-application) the application. ## CloudWatch Metrics The following custom CloudWatch metrics are generated by this function on 5 minute intervals. You can find these metrics in the AWS console under CloudWatch and then Metrics or you can query them using the CloudWatch API. | Namespace | MetricName | Dimensions | Unit | Description | | --- | --- | --- | --- | --- | | PersonalizeMonitor | monitoredResourceCount | | Count | Number of recommenders and campaigns currently being monitored at interval | | PersonalizeMonitor | minRecommendationRequestsPerSecond | RecommenderArn | Count/Second | `minRecommendationRequestsPerSecond` value for the recommender at interval | | PersonalizeMonitor | averageRPS | RecommenderArn | Count/Second | Average RPS for the recommender at interval | | PersonalizeMonitor | recommenderUtilization | RecommenderArn | Percent | Utilization percentage of `averageRPS` vs `minRecommendationRequestsPerSecond` at interval | | PersonalizeMonitor | minProvisionedTPS | CampaignArn | Count/Second | `minProvisionedTPS` value for the campaign at interval | | PersonalizeMonitor | averageTPS | CampaignArn | Count/Second | Average TPS for the campaign at interval | | PersonalizeMonitor | campaignUtilization | CampaignArn | Percent | Utilization percentage of `averageTPS` vs `minProvisionedTPS` at interval | ### How is averageRPS/averageTPS calculated? The `averageRPS` and `averageTPS` metric value for each monitored recommender and campaign is calculated by first determining the number of requests made to the recommender or campaign during the 5 minute interval and dividing by 300 (the number of seconds in 5 minutes). The number of requests is pulled from the `GetRecommendations` or `GetPersonalizedRanking` metric (depending on the underlying recipe) for the recommender/campaign from the `AWS/Personalize` namespace. The request count metric is automatically updated by Personalize itself. ## CloudWatch Alarms (optional) You can optionally have CloudWatch alarms dynamically created for monitored recommenders/campaigns for low utilization and idle recommenders/campaigns. ### Low Recommender/Campaign Utilization Alarm If you set the `AutoCreateUtilizationAlarms` CloudFormation template parameter to `Yes` when you installed this application, this function will automatically create a CloudWatch alarm for every recommender and campaign that it monitors. The alarm will trigger when the `recommenderUtilization` or `campaignUtilization` custom metric described above drops below the `UtilizationThresholdAlarmLowerBound` installation parameter for 9 out of 12 evaluation periods. Since the intervals are 5 minutes, that means that 9 of the 12 five minute evaluations over a 60 minute span must be below the threshold to enter an alarm status. The same rule applies to transition from alarm to OK status. The alarm will be created in the region where the recommender/campaign was created. An [SNS](https://aws.amazon.com/sns/) topic created by this application will be used as the alarm and ok actions and the `NotificationEndpoint` (email address) deployment parameter will be setup as a subscriber to the topic. **Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.** The alarm will have its actions disabled when the `minRecommendationRequestsPerSecond` or `minProvisionedTPS` is 1 and enabled with `minRecommendationRequestsPerSecond` or `minProvisionedTPS` is > 1 so that notifications are only sent when utilization can be impacted by adjusting `minRecommendationRequestsPerSecond`/`minProvisionedTPS`. ### Idle Recommender/Campaign Alarm If you set the `AutoCreateIdleAlarms` CloudFormation template parameter to `Yes` when you installed this application, this function will automatically create a CloudWatch alarm for every monitored recommender/campaign that is idle for at least `IdleThresholdHours` hours. The actions for the alarm will be enabled only after the recommender/campaign has existed for `IdleThresholdHours` as well. The `GetRecommendations` or `GetPersonalizedRanking` (depending on the resource's recipe) will be used to assess the resource's idle state. The alarm will be created in the region where the recommender/campaign was created. An [SNS](https://aws.amazon.com/sns/) topic created by this application will be used as the alarm and ok actions and the `NotificationEndpoint` (email address) deployment parameter will be setup as a subscriber to the topic. **Be sure to confirm the subscription sent when this application creates SNS topics and subscribes the email address you provided. You will receive a confirmation email for a topic created in each region where resources are monitored.** ## Automatically adjusting minRecommendationRequestsPerSecond (recommenders) and minProvisionedTPS (campaigns) (optional) If the `AutoAdjustMinTPS` deployment parameter is `Yes`, this function will check the actual hourly RPS/TPS over the last 14 days against the currently configured `minRecommendationRequestsPerSecond`/`minProvisionedTPS` and look for opportunities to reduce the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to optimize utilization and reduce costs. It does this by checking the recommender's or campaign's request volume for the previous 14 days on hourly intervals and finding the hour with the lowest average RPS/TPS (low watermark). If the low watermark average is less than `minRecommendationRequestsPerSecond`/`minProvisionedTPS` AND the recommender/campaign is more than 1 day old, it will drop the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` by 25%. This process will be repeated each hour until either the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` meets the low watermark RPS/TPS or the `minRecommendationRequestsPerSecond`/`minProvisionedTPS` reaches 1 (the lowest allowed value). **This function will NOT increase the `minRecommendationRequestsPerSecond`/`minProvisionedTPS`.** Instead it will rely on Personalize to auto-scale recommenders/campaigns up and back down to `minRecommendationRequestsPerSecond`/`minProvisionedTPS` to meet demand. > Since it can take several minutes for a recommender/campaign to redeploy after updating its `minRecommendationRequestsPerSecond`/`minProvisionedTPS`, you will receive the notification when the redeploy starts. The recommender/campaign will continue to respond to `GetRecommendations`/`GetPersonalizedRanking` API requests while it is redeploying. There will be no interruption of service. See the [personalize_update_tps](../personalize_update_tps_function/) function for details on the update function. ## Automatically stopping recommenders and deleting idle campaigns (optional) If the `AutoDeleteOrStopIdleResources` deployment parameter is `Yes`, this function will perform additional checks once per hour for each monitored recommender/campaign to see if it has been idle for more than `IdleThresholdHours` hours. The purpose of this feature is to prevent abandoned recommenders/campaigns from continuing to incur inference costs when they are no longer being used. Recommender/campaign checks are distributed across each hour in 10 minute blocks in an attempt to spread out the API calls needed to check and update recommenders/campaigns. To avoid too aggressively stopping recommenders or deleting campaigns, new recommenders/campaigns that are not more than `IdleThresholdHours` hours old are exempt from being stopped/deleted. Similarly, if a recommender/campaign has been updated within `IdleThresholdHours`, it will also be exempt from being automatically stopped/deleted. The idea is that new or actively updated recommenders/campaigns are likely not safe to delete.