{ "cells": [ { "cell_type": "markdown", "id": "7cae5f20", "metadata": {}, "source": [ "# Retail Demo Store Experimentation Workshop - Amazon CloudWatch Evidently\n", "\n", "In this workshop we will create an A/B experiment using [Amazon CloudWatch Evidently](https://aws.amazon.com/cloudwatch/features/). Evidently lets application developers conduct experiments and identify unintended consequences of new features before rolling them out for general use, thereby reducing risk related to new feature roll-out. Evidently allows you to validate new features across the full application stack before release, which makes for a safer release. When launching new features, you can expose them to a small user base, monitor key metrics such as page load times or conversions, and then dial up traffic. Evidently also allows you to try different designs, collect user data, and release the most effective design in production. \n", "\n", "Recommended Time: 30 minutes" ] }, { "cell_type": "markdown", "id": "6e0417e2", "metadata": {}, "source": [ "## CloudWatch Evidently concepts\n", "\n", "To get started with CloudWatch Evidently, for either a feature launch or an A/B experiment, you first create a project. A project is a logical grouping of resources. Within the project, you create features that have variations that you want to test or launch.\n", "\n", "When the Retail Demo Store project was deployed in your account, an Evidently project and several Evidently features were automatically created. These resources are defined in the [evidently.yaml](https://github.com/aws-samples/retail-demo-store/blob/master/aws/cloudformation-templates/base/evidently.yaml) CloudFormation stack. Open a new browser window/tab and browse to [CloudWatch](https://console.aws.amazon.com/cloudwatch/home); under \"Application monitoring\" in the left navigation, you will find Evidently.\n", "\n", "![Evidently Project page](./images/evidently/evidently_project.png)\n", "\n", "### Retail Demo Store features\n", "\n", "When a user is interacting with web application views like the home page, product detail page, the search auto-complete drop-down, and the shop livestreams page, the web application will make requests to the Recommendations microservice to retrieve or order products for the given user experience. One of the parameters passed with each of these requests is the feature name. The feature name is mapped to Evidently features. The following application features are already instrumented in the Retail Demo Store web application as well as mapped to Evidently features in the Evidently project.\n", "\n", "- **Home page (top)**: the top of the home page view displays recommended products in two states: when the current visitor/user is new/cold and when the current visitor is warm. The new/cold visitor/user is automatically transitioned to the warm experience after 2 product view events have been sent to the Personalize event tracker. Selecting an existing shopper also puts the user into the warm user experience.\n", " - \"**Popular products**\": for new/cold users, displays popular products in a grid widget. The feature name is `home_product_recs_cold`.\n", " - \"**Inspired by your shopping trends**\": for existing/warm users, displays products from one of the supported product recommenders configured in the Recommendations microservice. The feature name is `home_product_recs`.\n", "- **Home page (bottom)**: the bottom of the home page view displays featured products in a carousel widget.\n", " - \"**Featured products**\": this is the carousel widget at the bottom of the home page view where featured products are displayed. The feature name is `home_featured_rerank`.\n", "- **Product detail page**: when you click on a product, you are taken to the product detail view.\n", " - \"**Compare similar items**\": this the carousel widget on the product detail view that displays products similar to the product being displayed. The feature name is `product_detail_related`.\n", "- **Navigation**: the header navigation.\n", " - \"**Search results**\": this is the search drop-down in the web application's navigational header. For this feature, we can test personalized ranking of search results against search results that are ordered by Amazon OpenSearch. The feature name is `search_results`.\n", "- **Shop live streams**: on the \"Shop\" drop-down is a \"Shop Livestreams\" option. This page provides a live streaming interface for demonstrating products and making recommendations.\n", " - \"**Shop livestreams - discounted products**\": this is the sidebar vertical widget on the Shop Livestreams page. It displays discounted products highighted in the live stream. The feature name is `live_stream_prod_recommendation`.\n", " - \"**Shop livestreams - Compare similar items**\": this is the carousel widget at the bottom of the Shop Livestreams page. It displays products similar to the product currently being featured in the live stream. The feature name is `live_stream_prod_discounts`.\n", "\n", "To examine the Evidently features, click on the Retail Demo Store's project in Evidently.\n", "\n", "![Evidently Features page](./images/evidently/evidently_features.png)" ] }, { "cell_type": "markdown", "id": "3186c98d", "metadata": {}, "source": [ "## Retail Demo Store / CloudWatch Evidently integration\n", "\n", "Let's examine the Evidently integration in more detail. \n", "\n", "- You start by creating features in Evidently mapped to user experiences in your application that you want to control with launch/feature flags or experiments. As mentioned above, this was already done for you in the Retail Demo Store during deployment. User experiences can be as simple as a view title or button color or as complex as a panel or widget.\n", "- Next you instrument the user interface logic in your application for each feature to call Evidently to determine the variation to use. Each feature can have one or more variations and each variation has a value that can be a boolean/flag, number, or string. In our case, we're storing small JSON snippets as string variation values. The JSON contains the details needed to map a variation value to a recommender implementation.\n", "- Experiments can be created for features by specifying the experiment duration, how to split traffic across variations, metrics that will be collected to measure the outcome of the experiment, and more.\n", "- Your application calls the Evidently [EvaluateFeature](https://docs.aws.amazon.com/cloudwatchevidently/latest/APIReference/API_EvaluateFeature.html) or [BatchEvaluateFeature](https://docs.aws.amazon.com/cloudwatchevidently/latest/APIReference/API_BatchEvaluateFeature.html) API to retrieve the variation details for given feature(s) for the current user. This can be done in the client using the AWS SDK for JavaScript or server-side using any of the supported language SDKs. For the Retail Demo Store, these calls are made in the Recommendations microservice. \n", " - If an experiment is active for a feature, the evaluate feature response will include a `reason` of `EXPERIMENT_RULE_MATCH`. In this case, you also need to send an experiment exposure event to Evidently to indicate that the current user is receiving the assigned variation.\n", "- When/if the user \"converts\" against a variation that is part of an experiment, a \"conversion\" event needs to be sent to Evidently. In our case, a conversion is when the user clicks on a product included in a variation's recommended products. The Retail Demo Store web application calls the `/experiment/outcome` endpoint on the Recommendations microservice which then calls the Evidently [PutProjectEvents](https://docs.aws.amazon.com/cloudwatchevidently/latest/APIReference/API_PutProjectEvents.html) API.\n", "\n", "The following architecture diagram summarizes the integration.\n", "\n", "![Retail Demo Store Evidently architecture](./images/evidently/rds_evidently_architecture.png)\n" ] }, { "cell_type": "markdown", "id": "75ece3ad", "metadata": {}, "source": [ "## Running an Evidently experiment\n", "\n", "Let's walk through the process of creating an experiment in Evidently, examine how the Retail Demo Store web application displays variations, and then simulate an experiment so we can inspect the results. We're going to illustrate how to run an experiment in code using the Python programming language but you could also use the Evidently console.\n", "\n", "### Our experiment hypothesis\n", "\n", "**Sample scenario:**\n", "\n", "Website analytics have shown that user sessions frequently end on the home page for our e-commerce site, the Retail Demo Store. Furthermore, when users do make a purchase, most purchases are for a single product. Currently on our home page we are using a basic approach of recommending featured products (control experience). We hypothesize that replacing the current featured products approach on the homepage with personalized recommendations from Amazon Personalize (variation) will result in increasing the CTR (click-through rate) of products by 25%. The current click-through rate is 15%.\n", "\n", "### Setup - import dependencies\n", "\n", "First, let's import the dependencies needed to interact with the Evidently API via Python." ] }, { "cell_type": "code", "execution_count": null, "id": "c452e074", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "import requests\n", "import time\n", "import pandas as pd\n", "import numpy as np\n", "import random\n", "import scipy.stats as scs\n", "from collections import defaultdict\n", "from datetime import datetime, timedelta" ] }, { "cell_type": "markdown", "id": "b263a79c", "metadata": {}, "source": [ "Next let's create the clients we'll need to make API calls. We'll also declare the feature name that we'll be using for this experiment (`home_product_recs`)." ] }, { "cell_type": "code", "execution_count": null, "id": "97211fa5", "metadata": {}, "outputs": [], "source": [ "# Evidently client\n", "evidently = boto3.client('evidently')\n", "# Service discovery will allow us to dynamically discover Retail Demo Store microservices\n", "servicediscovery = boto3.client('servicediscovery')\n", "\n", "feature = 'home_product_recs'\n", "\n", "# The Uid is a unique ID and we need it to find the role made by CloudFormation\n", "with open('/opt/ml/metadata/resource-metadata.json') as f:\n", " data = json.load(f)\n", "sagemaker = boto3.client('sagemaker')\n", "sagemakerResponce = sagemaker.list_tags(ResourceArn=data[\"ResourceArn\"])\n", "for tag in sagemakerResponce[\"Tags\"]:\n", " if tag['Key'] == 'Uid':\n", " project_name = tag['Value']\n", " break\n", "\n", "print('project_name:', project_name)" ] }, { "cell_type": "markdown", "id": "4729d199", "metadata": {}, "source": [ "### Inspect feature\n", "\n", "Before creating and starting our experiment, let's take a look at the feature. As a reminder, this feature was setup in Evidently during the deployment of the Retail Demo Store." ] }, { "cell_type": "code", "execution_count": null, "id": "57636b2b", "metadata": {}, "outputs": [], "source": [ "response = evidently.get_feature(project = project_name, feature = feature)\n", "print(json.dumps(response['feature'], indent = 2, default = str))" ] }, { "cell_type": "markdown", "id": "59e4a8ee", "metadata": {}, "source": [ "In the response above, take special note of the `variations` array and the `value.stringValue` value for each variation. As mentioned above, we're actually storing a short JSON document as the `stringValue`. The Recommendations microservice will parse this string as JSON and extract the information it needs to \"wire up\" each variation to a product recommendation implementation. In this case we have two variations for this feature:\n", "\n", "- Variation 0: **Featured Products** - displays featured products from the product catalog. For this variation, we just need the `type` of `product` to tell the Recommendations service how to resolve featured products from the Products microservice.\n", "- Variation 1: **Personalize-UserPersonalization** - displays personalized product recommendations from an Amazon Personalize campaign or recommender. To resolve this variation we need a `type` of `personalize-recommendations` and the Personalize campaign or recommender ARN to call to get recommendations for each user. The `arn` field in the JSON snippet provides the ARN needed by the Recommendations service." ] }, { "cell_type": "markdown", "id": "ba715b46", "metadata": {}, "source": [ "### Evaluate feature before experiment\n", "\n", "Before creating an experiment, let's take a look at what the [EvaluateFeature](https://docs.aws.amazon.com/cloudwatchevidently/latest/APIReference/API_EvaluateFeature.html) API returns for the `home_product_recs` feature." ] }, { "cell_type": "code", "execution_count": null, "id": "74cf78a1", "metadata": {}, "outputs": [], "source": [ "user_id = '123'\n", "\n", "response = evidently.evaluate_feature(\n", " entityId = user_id,\n", " feature = feature,\n", " project = project_name\n", ")\n", "\n", "print(json.dumps(response, indent = 2, default = str))" ] }, { "cell_type": "markdown", "id": "4708297a", "metadata": {}, "source": [ "Take note that the `reason` is `DEFAULT` and the only variation and value is the default variation from the `GetFeature` call above. This means that there is currently not an active experiment or launch for this feature." ] }, { "cell_type": "markdown", "id": "c4c8625d", "metadata": {}, "source": [ "### Create experiment\n", "\n", "Now we will create an experiment for the `home_product_recs` feature by calling the [CreateExperiment](https://docs.aws.amazon.com/cloudwatchevidently/latest/APIReference/API_CreateExperiment.html) API. You could also create the experiment in the Evidently console.\n", "\n", "First we will define the metric event pattern. This pattern is used match events for exposures and outcome/conversion events across all features back to this particular experiment. This is very similar to Amazon EventBridge [patterns](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-patterns.html). The logic in the Recommendations service is to use `userDetails.userId` to identify the user and to build the `valueKey` by taking the feature name, convert it from snake case to camel case, and append `Clicked` to the result. So `home_product_recs` becomes `homeProductRecsClicked`. This is put within a `details` dictionary to arrive at a final value key of `details.homeProductRecsClicked`. Experiments for every feature in the Retail Demo Store is handled the same way.\n", "\n", "To hopefully pull this together, here is an example of a conversion event that the Recommendations service will send to Evidently for user `abc123`. Since we're tracking click-through-rate, the value is `1.0` to indicate a conversion/click for this user.\n", "\n", "```javascript\n", "{\n", " 'details': {\n", " 'homeProductRecsClicked': 1.0\n", " },\n", " 'userDetails': {\n", " 'userId': 'abc123'\n", " }\n", "}\n", "```\n", "\n", "Now the pattern that Evidently needs to match events formatted like this is defined in the following cell." ] }, { "cell_type": "code", "execution_count": null, "id": "8a0c3848", "metadata": {}, "outputs": [], "source": [ "metric_event_pattern = {\n", " \"userDetails.userId\": [\n", " {\n", " \"exists\": True\n", " }\n", " ],\n", " \"details.homeProductRecsClicked\": [\n", " {\n", " \"exists\": True\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "id": "0a468751", "metadata": {}, "source": [ "We will also define the variations to use for the experiment." ] }, { "cell_type": "code", "execution_count": null, "id": "ead65b37", "metadata": {}, "outputs": [], "source": [ "experiment_variations = [\n", " {\n", " 'feature': feature,\n", " 'variation': 'FeaturedProducts',\n", " 'name': 'FeaturedProducts'\n", " },\n", " {\n", " 'feature': feature,\n", " 'variation': 'Personalize-UserPersonalization',\n", " 'name': 'Personalize-UserPersonalization'\n", " }\n", "]" ] }, { "cell_type": "markdown", "id": "082128ef", "metadata": {}, "source": [ "Now we can create the experiment, passing the metric event pattern as the `metricGoals` argument and the experiment variations as the `treatments` argument. The control treatment will be `FeaturedProducts` (the \"A\" in our A/B test), the default non-personalized product recommendations on the homepage. The `Personalize-UserPersonalization` variation will be the \"B\" in our A/B test." ] }, { "cell_type": "code", "execution_count": null, "id": "1d4b249f", "metadata": {}, "outputs": [], "source": [ "experiment_name = 'home_product_recs_personalization'\n", "\n", "response = evidently.create_experiment(\n", " project = project_name,\n", " name = experiment_name,\n", " description = 'Testing default product recommendations of featured products against Amazon Personalize generated recommendations',\n", " metricGoals = [\n", " {\n", " 'desiredChange': 'INCREASE',\n", " 'metricDefinition': {\n", " 'entityIdKey': 'userDetails.userId',\n", " 'valueKey': 'details.homeProductRecsClicked',\n", " 'eventPattern': json.dumps(metric_event_pattern),\n", " 'name': 'homeProductRecsClicked'\n", " }\n", " },\n", " ],\n", " onlineAbConfig={\n", " 'controlTreatmentName': 'FeaturedProducts',\n", " 'treatmentWeights': {\n", " 'FeaturedProducts': 50000,\n", " 'Personalize-UserPersonalization': 50000\n", " }\n", " },\n", " samplingRate = 100000,\n", " treatments = experiment_variations\n", ")\n", "\n", "print(json.dumps(response['experiment'], indent = 2, default = str))" ] }, { "cell_type": "markdown", "id": "371f477f", "metadata": {}, "source": [ "### Start experiment\n", "\n", "With the experiment created, now it's time to start the experiment to make it active. We'll set the experiment end date to be one week from now." ] }, { "cell_type": "code", "execution_count": null, "id": "b74187de", "metadata": {}, "outputs": [], "source": [ "experiment_end_date = datetime.now() + timedelta(days = 7)\n", "\n", "response = evidently.start_experiment(\n", " project = project_name,\n", " experiment = experiment_name,\n", " analysisCompleteTime = experiment_end_date\n", ")\n", "\n", "print(json.dumps(response, indent = 2, default = str))" ] }, { "cell_type": "markdown", "id": "883f6d5f", "metadata": {}, "source": [ "As mentioned earlier, you could have completed the previous steps in the CloudWatch Evidently console as well." ] }, { "cell_type": "markdown", "id": "09fb02b0", "metadata": {}, "source": [ "### Evaluate feature after experiment has started\n", "\n", "Now let's take a look at the EvaluateFeature API response now that the experiment has been created and started." ] }, { "cell_type": "code", "execution_count": null, "id": "6f7a4857", "metadata": { "scrolled": false }, "outputs": [], "source": [ "response = evidently.evaluate_feature(\n", " entityId = user_id,\n", " feature = feature,\n", " project = project_name\n", ")\n", "\n", "print(json.dumps(response, indent = 2, default = str))" ] }, { "cell_type": "markdown", "id": "2724b2e5", "metadata": {}, "source": [ "Notice this time that the `reason` is now `EXPERIMENT_RULE_MATCH`, we have details on the active experiment in the `details` dictionary, and we have details on the assigned variation in the `value` dictionary. The Recommendations service uses this response to know which resolver to use to provide the response." ] }, { "cell_type": "markdown", "id": "f26b8895", "metadata": {}, "source": [ "### Inspect storefront\n", "\n", "To see the experiment in action on the Retail Demo Store storefront, browse to the storefront deployed in your account, and then either sign in or create an account. On the home view you should see an indicator that an experiment is active and the variation assigned to the currently selected shopper is indicated for each product.\n", "\n", "_Note: the Recommendations service caches `BatchEvaluateFeature` responses so you may not see feature and experiment changes reflected in the storefront for up to 30 seconds._ \n", "\n", "![Retail Demo Store active Evidently experiment](./images/evidently/rds_active_evidently_experiment.png)\n", "\n", "To see how the assigned variation changes depending on the current shopper, try switching shoppers from the \"Shopper\" dropdown in the top naviation.\n", "\n", "![Retail Demo Store switch shopper](./images/evidently/rds_switch_shoppers.png)\n", "\n", "You may have to cycle through a few shoppers to land on one that hashes to a different variation.\n", "\n", "#### Recommendations service implementation details\n", "\n", "Diving deeper into the `BatchEvaluateFeature` implementation in the Recommendations service, the [EvidentlyFeatureResolver](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/evidently_feature_resolver.py) class has a function named `_call_evidently_evaluate_features`. This function is called to evaluate all of the known features in the Retail Demo Store for the current user. This is more efficient than calling `EvaluateFeature` for each feature and allows us to cache the response in the service. Below is the snippet of the relevant source code.\n", "\n", "```python\n", "class EvidentlyFeatureResolver:\n", " def _call_evidently_evaluate_features(self, user_id: str) -> List[Dict]:\n", " requests = []\n", " for feature in FEATURE_NAMES:\n", " requests.append({\n", " 'entityId': user_id,\n", " 'feature': feature\n", " })\n", "\n", " response = evidently.batch_evaluate_feature(\n", " project=project_name,\n", " requests=requests\n", " )\n", "\n", " return response['results']\n", "```" ] }, { "cell_type": "markdown", "id": "5c597fdd", "metadata": {}, "source": [ "### Create conversion/click events\n", "\n", "To trigger a conversion event to be sent to Evidently's [PutProjectEvents](https://docs.aws.amazon.com/cloudwatchevidently/latest/APIReference/API_PutProjectEvents.html) API, click on a product displayed by the variation. As you switch shoppers, trigger conversion events for some and not others to see how they're tracked in the Evidently results dashboard for the experiment. You can also inspect the requests sent from the Retail Demo Store web application to the Recommendations service using the Developer Tools in your browser. The two endpoints are `/recommendations` to retrieve recommendations and `/experiment/outcome` to send outcome notifications to the backend.\n", "\n", "#### Recommendations service implementation details\n", "\n", "Diving deeper into the `PutProjectEvents` implementation in the Recommendations service, the [EvidentlyExperiment](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment_evidently.py) class has a function named `_send_evidently_event`. This function is called to send exposure and conversion events for a user and metric when the `/experiment/outcome` endpoint is called from the web application. Below is the snippet of the relevant source code.\n", "\n", "```python\n", "class EvidentlyExperiment(experiment.Experiment):\n", " def _send_evidently_event(self, user_id: str, metric_value: float, timestamp: datetime = datetime.now()):\n", " # In case None is passed for timestamp\n", " timestamp = datetime.now() if not timestamp else timestamp\n", "\n", " # We convert the feature name from snake case to camel case for the metric value key.\n", " metric_name = f'{self._snake_to_camel_case(self.feature)}Clicked'\n", "\n", " response = evidently.put_project_events(\n", " project = self.project,\n", " events = [\n", " {\n", " 'type': 'aws.evidently.custom',\n", " 'timestamp': timestamp,\n", " 'data': json.dumps({\n", " 'details': {\n", " metric_name: metric_value\n", " },\n", " 'userDetails': {\n", " 'userId': str(user_id)\n", " }\n", " })\n", " }\n", " ]\n", " )\n", "```" ] }, { "cell_type": "markdown", "id": "0e6fcced", "metadata": {}, "source": [ "## Simulate experiment\n", "\n", "Now let's simulate a running experiment by calling the `/recommendations` and `/experiment/outcome` endpoints on the Recommendations microservice for samples of users from the Users microservice. The following code cells will prepare the data and functions needed to run the simulation as well as show the results." ] }, { "cell_type": "markdown", "id": "11c72f09", "metadata": {}, "source": [ "### Load Users\n", "\n", "For our experiment simulation, we will load all Retail Demo Store users and run the experiment until the sample size for both variations has been met.\n", "\n", "First, let's discover the IP address for the Retail Demo Store's [Users](https://github.com/aws-samples/retail-demo-store/tree/master/src/users) service." ] }, { "cell_type": "code", "execution_count": null, "id": "350f66a8", "metadata": {}, "outputs": [], "source": [ "response = servicediscovery.discover_instances(\n", " NamespaceName='retaildemostore.local',\n", " ServiceName='users',\n", " MaxResults=1,\n", " HealthStatus='HEALTHY'\n", ")\n", "\n", "users_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']\n", "print('Users Service Instance IP: {}'.format(users_service_instance))" ] }, { "cell_type": "markdown", "id": "01878af6", "metadata": {}, "source": [ "Next, let's fetch all users, randomize their order, and load them into a local data frame." ] }, { "cell_type": "code", "execution_count": null, "id": "3fe0c56f", "metadata": {}, "outputs": [], "source": [ "# Load all users so we have enough to satisfy our sample size requirements.\n", "response = requests.get('http://{}/users/all?count=10000'.format(users_service_instance))\n", "users = response.json()\n", "random.shuffle(users)\n", "users_df = pd.DataFrame(users)\n", "pd.set_option('display.max_rows', 5)\n", "\n", "users_df" ] }, { "cell_type": "markdown", "id": "ff2dd6e6", "metadata": {}, "source": [ "### Discover Recommendations service\n", "\n", "Next, let's discover the IP address for the Retail Demo Store's [Recommendations](https://github.com/aws-samples/retail-demo-store/tree/master/src/recommendations) service. This is the service where the Experimentation framework and Evidently integration is implemented. We will call endpoints on this service to simulate our Evidently A/B experiment." ] }, { "cell_type": "code", "execution_count": null, "id": "c56cfd37", "metadata": {}, "outputs": [], "source": [ "response = servicediscovery.discover_instances(\n", " NamespaceName='retaildemostore.local',\n", " ServiceName='recommendations',\n", " MaxResults=1,\n", " HealthStatus='HEALTHY'\n", ")\n", "\n", "recommendations_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']\n", "print('Recommendation Service Instance IP: {}'.format(recommendations_service_instance))" ] }, { "cell_type": "markdown", "id": "d859d731", "metadata": {}, "source": [ "### Sample Size Calculation\n", "\n", "The first step is to determine the sample size necessary to reach a statistically significant result given a target of 25% gain in click-through rate from the home page. There are several sample size calculators available online including calculators from [Optimizely](https://www.optimizely.com/sample-size-calculator/?conversion=15&effect=20&significance=95), [AB Tasty](https://www.abtasty.com/sample-size-calculator/), and [Evan Miller](https://www.evanmiller.org/ab-testing/sample-size.html#!15;80;5;25;1). For this exercise, we will use the following function to calculate the minimal sample size for each variation." ] }, { "cell_type": "code", "execution_count": null, "id": "d69e2652", "metadata": {}, "outputs": [], "source": [ "def min_sample_size(bcr, mde, power=0.8, sig_level=0.05):\n", " \"\"\"Returns the minimum sample size to set up a split test\n", "\n", " Arguments:\n", " bcr (float): probability of success for control, sometimes\n", " referred to as baseline conversion rate\n", "\n", " mde (float): minimum change in measurement between control\n", " group and test group if alternative hypothesis is true, sometimes\n", " referred to as minimum detectable effect\n", "\n", " power (float): probability of rejecting the null hypothesis when the\n", " null hypothesis is false, typically 0.8\n", "\n", " sig_level (float): significance level often denoted as alpha,\n", " typically 0.05\n", "\n", " Returns:\n", " min_N: minimum sample size (float)\n", "\n", " References:\n", " Stanford lecture on sample sizes\n", " http://statweb.stanford.edu/~susan/courses/s141/hopower.pdf\n", " \"\"\"\n", " # standard normal distribution to determine z-values\n", " standard_norm = scs.norm(0, 1)\n", "\n", " # find Z_beta from desired power\n", " Z_beta = standard_norm.ppf(power)\n", "\n", " # find Z_alpha\n", " Z_alpha = standard_norm.ppf(1-sig_level/2)\n", "\n", " # average of probabilities from both groups\n", " pooled_prob = (bcr + bcr+mde) / 2\n", "\n", " min_N = (2 * pooled_prob * (1 - pooled_prob) * (Z_beta + Z_alpha)**2\n", " / mde**2)\n", "\n", " return min_N" ] }, { "cell_type": "markdown", "id": "42b7b107", "metadata": {}, "source": [ "Next we will call the `min_sample_size` function with the parameters of our testing hypothesis." ] }, { "cell_type": "code", "execution_count": null, "id": "b46186ca", "metadata": {}, "outputs": [], "source": [ "# This is the ficticious conversion rate of the existing control experience\n", "baseline_conversion_rate = 0.15\n", "# This is the lift expected by adding personalization with our variation\n", "absolute_percent_lift = baseline_conversion_rate * .25\n", "\n", "# Calculate the sample size needed to reach a statistically significant result\n", "sample_size = int(min_sample_size(baseline_conversion_rate, absolute_percent_lift))\n", "\n", "print('Sample size for each variation: ' + str(sample_size))" ] }, { "cell_type": "markdown", "id": "a7cf3d23", "metadata": {}, "source": [ "### Simulation Function\n", "\n", "The following `simulate_experiment` function is supplied with the sample size for each group (A and B) and the probability of conversion for each group that we want to use for our simulation. It runs across enough users to satisfy the sample size requirements and calls the Recommendations service for each user in the experiment." ] }, { "cell_type": "code", "execution_count": null, "id": "48926dd6", "metadata": {}, "outputs": [], "source": [ "def simulate_experiment(N_A, N_B, p_A, p_B):\n", " \"\"\"Returns a pandas dataframe with simulated CTR data\n", "\n", " Parameters:\n", " N_A (int): sample size for control group\n", " N_B (int): sample size for test group\n", " Note: final sample size may not match N_A & N_B provided because the\n", " group at each row is chosen at random by the ABExperiment class.\n", " p_A (float): conversion rate; conversion rate of control group\n", " p_B (float): conversion rate; conversion rate of test group\n", "\n", " Returns:\n", " df (df)\n", " \"\"\"\n", "\n", " # will hold exposure/outcome data\n", " data = []\n", "\n", " # total number of users to sample for both variations\n", " N = N_A + N_B\n", " \n", " if N > len(users):\n", " raise ValueError('Sample size is greater than number of users')\n", "\n", " print(f'Generating data for {N} users... this may take a few minutes')\n", " print('While waiting for the simulation to finish, open the CloudWatch Evidently console in another browser tab/window to view results')\n", "\n", " # initiate bernoulli distributions to randomly sample from based on simulated probabilities\n", " A_bern = scs.bernoulli(p_A)\n", " B_bern = scs.bernoulli(p_B)\n", " \n", " outcome_url = f'http://{recommendations_service_instance}/experiment/outcome'\n", " \n", " recs_responses = defaultdict(int)\n", " outcome_responses = defaultdict(int)\n", " \n", " for idx in range(N):\n", " if idx > 0 and idx % 500 == 0:\n", " print('Generated data for {} users so far'.format(idx))\n", " \n", " # initite empty row\n", " row = {}\n", "\n", " # Get next user from shuffled list\n", " user = users[idx]\n", "\n", " # Call Recommendations web service to get recommendations for the user\n", " rec_url = f'http://{recommendations_service_instance}/recommendations?userID={user[\"id\"]}&feature={feature}'\n", " response = requests.get(rec_url)\n", " recs_responses[str(response.status_code)] += 1\n", "\n", " recommendations = response.json()\n", " recommendation = recommendations[random.randint(0, len(recommendations)-1)]\n", " \n", " variation = recommendation['experiment']['variationIndex']\n", " row['variation'] = variation\n", " \n", " # Determine if variation converts based on probabilities provided\n", " if variation == 'FeaturedProducts':\n", " row['converted'] = A_bern.rvs()\n", " elif variation == 'Personalize-UserPersonalization':\n", " row['converted'] = B_bern.rvs()\n", " \n", " if row['converted'] == 1:\n", " # Update experiment with outcome/conversion\n", " payload = {\n", " 'correlationId': recommendation['experiment']['correlationId']\n", " }\n", " response = requests.post(outcome_url, data=payload)\n", " time.sleep(.150)\n", " outcome_responses[str(response.status_code)] += 1\n", " \n", " data.append(row)\n", " \n", " time.sleep(.150)\n", " \n", " # convert data into dataframe\n", " df = pd.DataFrame(data)\n", " \n", " print('Done')\n", " \n", " print(\"Recommendations response stats:\")\n", " print(recs_responses)\n", " print(\"Outcome response stats:\")\n", " print(outcome_responses)\n", "\n", " return df" ] }, { "cell_type": "markdown", "id": "f2d208c6", "metadata": {}, "source": [ "### Run Simulation\n", "\n", "Next we run the simulation by defining our simulation parameters for sample sizes and probabilities and then call `simulate_experiment`. This will take 15-20 minutes depending on the sample sizes. Open the Evidently results page in the CloudWatch Evidently console in another browser tab/window while the simulation runs to view the results in real-time. \n", "\n", "_Note: you may need to occasionally hard-refresh the Evidently results console page to see updated results as the simulation code below runs._" ] }, { "cell_type": "code", "execution_count": null, "id": "d83e4af8", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "# Set size of both groups to calculated sample size\n", "N_A = N_B = sample_size\n", "\n", "# Use probabilities from our hypothesis\n", "# bcr: baseline conversion rate\n", "p_A = 0.15\n", "# d_hat: difference in a metric between the two groups, sometimes referred to as minimal detectable effect or lift depending on the context\n", "p_B = 0.1875\n", "\n", "# Run simulation\n", "ab_data = simulate_experiment(N_A, N_B, p_A, p_B)\n", "\n", "ab_data" ] }, { "cell_type": "markdown", "id": "0cc6ebe5", "metadata": {}, "source": [ "### Analyze simulation results\n", "\n", "Next, let's take a closer look at the results of our simulation. We'll start by calculating some summary statistics." ] }, { "cell_type": "code", "execution_count": null, "id": "2462fe46", "metadata": {}, "outputs": [], "source": [ "ab_summary = ab_data.pivot_table(values='converted', index='variation', aggfunc=np.sum)\n", "# add additional columns to the pivot table\n", "ab_summary['total'] = ab_data.pivot_table(values='converted', index='variation', aggfunc=lambda x: len(x))\n", "ab_summary['rate'] = ab_data.pivot_table(values='converted', index='variation')\n", "ab_summary" ] }, { "cell_type": "markdown", "id": "2223c8a0", "metadata": {}, "source": [ "### Review Evidently results\n", "\n", "Once the simulation is complete, open (or refresh) the Evidently experiment results page in the CloudWatch console. The first view summarizes the event counts. This just tells us the volume of exposure and conversion events for each variation.\n", "\n", "![Evidently experiment event counts](./images/evidently/evidently_results_event_counts.png)\n", "\n", "Change the view to summarize on average values. This will reflect the CTR conversion events for each variation. You should see that the `Personalize-UserPersonalization` variation is the `Better` variation for this experiment. \n", "\n", "_Note: sometimes the colors used in the graph do not match the colors in the table below the graph for each variation. Refer to the graph legend._\n", "\n", "![Evidently experiment average value](./images/evidently/evidently_results_avg_value.png)" ] }, { "cell_type": "markdown", "id": "93aab821", "metadata": {}, "source": [ "### Stop experiment\n", "\n", "With the simulation finished, we will go ahead and stop the experiment. We will set the `desiredState` to `CANCELLED` (`COMPLETED` is the other possible value). By stopping the experiment, the Recommendations service will resume serving the default home page personalization user experience. It also allows other experiment types in the other Experimentation workshops to be tested." ] }, { "cell_type": "code", "execution_count": null, "id": "5e8a632c", "metadata": {}, "outputs": [], "source": [ "response = evidently.stop_experiment(\n", " desiredState = 'CANCELLED',\n", " experiment = experiment_name,\n", " project = project_name,\n", " reason = 'Completed experiment testing in the Evidently workshop'\n", ")\n", "\n", "print(json.dumps(response, indent = 2, default = str))" ] }, { "cell_type": "markdown", "id": "f7247cf3", "metadata": {}, "source": [ "## Next Steps\n", "\n", "You have completed the exercise for implementing an A/B test using the Amazon CloudWatch Evidently. In this workshop we only tested the `home_product_recs` feature on the homepage. There are several other features in the web application (listed near the top of this page) that are also instrumented and have Evidently features configured. Try setting up one or more experiments for the other features.\n", "\n", "Since Evidently features are evaluated before the built-in experimentation types, **be sure to cancel your Evidently experiment**. This can be done in the code cell above or in the CloudWatch > Evidently console." ] }, { "cell_type": "code", "execution_count": null, "id": "80710044", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }