{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Retail Demo Store Experimentation Workshop - Overview\n", "\n", "Welcome to the Retail Demo Store Experimentation Workshop. In this module we are going to add experimentation to our Retail Demo Store eCommerce website. This will allow us to experiment with different personalization approaches in the user interface. We take two different approaches adding experimention. First, we'll explore how to implement three different experimentation techniques from the ground up. This is the DIY path. Then we will explore how to use solutions from AWS partners [Amplitude](https://amplitude.com/) and [Optimizely](https://www.optimizely.com/) as well as [Amazon CloudWatch Evidently](https://aws.amazon.com/cloudwatch/features/), an AWS solution for deploying and monitoring A/B tests in your applications.\n", "\n", "The built-in experimentation techniques include the following.\n", "\n", "- A/B testing\n", "- Interleaving recommendation testing\n", "- Multi-armed bandit testing\n", "\n", "Each experimentation technique will be described in detail as well as how to evaluate results. Although these techniques will be used to measure different approaches to personalization, the framework presented can be easily used for other experimentation use-cases as well.\n", "\n", "The AWS partner workshops Amplitude and Optimizely and the Amazon CloudWatch Evidently workshop each use the A/B testing technique.\n", "\n", "The workshops can be performed in any order.\n", "\n", "Recommended Time: 15 minutes this notebook, ~1.5 hours total for all experiment workshops\n", "\n", "## Prerequisites\n", "\n", "Since this module uses the Retail Demo Store's Recommendation microservice to run experiments across variations that depend on the personalization features of the Retail Demo Store, it is assumed that you have either completed the [Personalization](../1-Personalization/Lab-1-Introduction-and-data-preparation.ipynb) workshop or those resources have been pre-provisioned and configured in your AWS environment. If you are unsure and attending an AWS managed event such as a workshop or immersion day, check with your event lead." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experimentation Framework and Built-In Techniques\n", "\n", "Before we can dive into the first experimentation technique, let's review the overall architecture (depicted below) and some of the code. Although this experimentation framework is implemented in the Python programming language, it can be easily ported to many other languages.\n", "\n", "> All of the source code for the experimentation architecture described here can be found in the Retail Demo Store's [Recommendations](https://github.com/aws-samples/retail-demo-store/tree/master/src/recommendations) web service which is in a Git repository (AWS CodeCommit or GitHub) attached to AWS CodePipeline in your AWS environment.\n", "\n", "![Experimentation Class Diagram](./images/experimentation-class-diagram.png)\n", "\n", "### Variation\n", "\n", "Let's start with **Variation**. In the context of this workshop, a **Varation** is a specific approach to providing product recommendations or ranking product lists for a user. As mentioned above, this framework can be used to test variations of any feature in the Retail Demo Store--we're just focusing on testing different approaches to making product recommendations or ranking products. A **Variation** represents the configuration information necessary to create a **Resolver** (described below) that is used to retrieve recommendations or rank a list of products. For example, for a variation backed by an Amazon Personalize campaign, the variation configuration would specify the Campaign ARN. For a variation backed by a \"similar item\" OpenSearch query in the Search service, the variation configuration would specify the host/IP for the Retail Demo Store Search microservice. And so on.\n", "\n", "Here is the code for the [**Variation**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment.py) class.\n", "\n", "```python\n", "# from src/recommendations/src/recommendations-service/experimentation/experiment.py\n", "\n", "class Variation:\n", " def __init__(self, **data):\n", " self.config = data\n", " self.resolver = ResolverFactory.get(**data)\n", "```\n", "\n", "There's not much to this class. It's holding the configuration information in `self.config` and creating the **Resolver** based on the configuration. The **ResolverFactory.get()** function takes the configuration for a resolver (it's type and initialization parameters) and returns the appropriate **Resolver** instance.\n", "\n", "### Resolver\n", "\n", "As mentioned above, a [**Resolver**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/resolvers.py) retrieves product recommendations or ranks a product list for a particular **Variation**. Here is the code for the **Resolver** abstract base class.\n", "\n", "```python\n", "# from src/recommendations/src/recommendations-service/experimentation/resolvers.py\n", "\n", "class Resolver(ABC):\n", " \"\"\" Abstract base class for all resolvers\"\"\"\n", " @abstractmethod\n", " def get_items(self, **kwargs):\n", " \"\"\" Returns recommended items for this resolver\n", "\n", " Arguments:\n", " Parameters needed by resolver to return recommendations\n", " \n", " Return: \n", " List of dictionaries where each dictionary minimally includes an 'itemId' key representing a recommended item\n", " \"\"\"\n", " pass\n", "```\n", "\n", "As you can see, a resolver only needs to implement the `get_items()` method. This method should return a list of dictionaries where each dictionary must have an `itemId` key with a value representing the ID for a recommended item. For this project, an item is a product. Here is a sample response with three items.\n", "\n", "```python\n", "[ \n", " {\n", " 'itemId': '56'\n", " }, \n", " {\n", " 'itemId': '7'\n", " }, \n", " {\n", " 'itemId': '13'\n", " }\n", "]\n", "```\n", "\n", "There are six implementations of **Resolver** provided in this project. Four resolvers are used for the product recommendation and related item use-cases and two resolvers are used for the product list ranking use-case.\n", "\n", "#### DefaultProductResolver\n", "\n", "The [**DefaultProductResolver**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/resolvers.py) will call the Retail Demo Store's Product microservice to retrieve a list of products in the same category of the current product. If the current product does not have an assigned category, the resolver will fallback to recommending featured products. Given the same product, all users will receive the same product recommendations. Therefore, this resolver does not provide personalized recommendations and is a good example of a very basic approach to making product suggestions.\n", "\n", "Below is the relevant code from the `DefaultProductResolver.get_items` method. See the source repository for a complete listing.\n", "\n", "```python\n", "# from src/recommendations/src/recommendations-service/experimentation/resolvers.py\n", "\n", "class DefaultProductResolver(Resolver):\n", " ...\n", " \n", " def get_items(self, **kwargs):\n", " ...\n", " category = None\n", "\n", " if product_id:\n", " # Lookup product to determine if it belongs to a category\n", " url = f'http://{self.products_service_host}/products/id/{product_id}'\n", " response = requests.get(url)\n", "\n", " if response.ok:\n", " category = response.json()['category']\n", "\n", " if category:\n", " # Product belongs to a category so get list of products in same category\n", " url = f'http://{self.products_service_host}/products/category/{category}'\n", " response = requests.get(url)\n", " else:\n", " # Product not specified or does not belong to a category so fallback to featured products\n", " url = f'http://{self.products_service_host}/products/featured'\n", " response = requests.get(url)\n", "\n", " if response.ok:\n", " # Create response making sure not to include current product\n", " for product in response.json():\n", " if product['id'] != product_id:\n", " items.append({'itemId': str(product['id'])})\n", "\n", " if len(items) >= num_results:\n", " break\n", " else:\n", " raise Exception(f'Error calling products service: {response.status_code}: {response.reason}')\n", "\n", " return items\n", "```\n", "\n", "#### SearchSimilarProductsResolver\n", "\n", "The [**SearchSimilarProductsResolver**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/resolvers.py) leverages the Retail Demo Store's Search service to retrieve a ranked list of products similar to the current product. Internally the Search service uses the OpenSearch \"[more like this](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html)\" query type. This can be considered a content filtering approach but still not personalized. Nevertheless, it is an improvement over **DefaultProductResolver** since items are ranked by relevance based on similarities across multiple product fields.\n", "\n", "Below is the relevant code from the `SearchSimilarProductsResolver.get_items` method. See the source repository for a complete listing.\n", "\n", "```python\n", "# from src/recommendations/src/recommendations-service/experimentation/resolvers.py\n", "\n", "class SearchSimilarProductsResolver(Resolver): \n", " ...\n", " \n", " def get_items(self, **kwargs): \n", " ...\n", " url = f'http://{self.search_service_host}/similar/products?productId={product_id}' \n", " response = requests.get(url) \n", " \n", " items = [] \n", " \n", " if response.ok: \n", " for product in response.json(): \n", " items.append({'itemId': str(product['id'])}) \n", " \n", " if len(items) >= num_results: \n", " break \n", " else: \n", " raise Exception(f'Error calling products service: {response.status_code}: {response.reason}') \n", " \n", " return items \n", "```\n", "\n", "#### PersonalizeRecommendationsResolver\n", "\n", "Given a user, the [**PersonalizeRecommendationsResolver**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/resolvers.py) will retrieve product recommendations from an Amazon Personalize campaign, such as the campaign created in the [Personalize workshop](../1-Personalization/Lab-1-Introduction-and-data-preparation.ipynb). Therefore, this resolver provides truly personalized recommendations.\n", "\n", "Below is the relevant code from the `PersonalizeRecommendationsResolver.get_items` method. See the source repository for a complete listing.\n", "\n", "```python\n", "# from src/recommendations/src/recommendations-service/experimentation/resolvers.py\n", "\n", "class PersonalizeRecommendationsResolver(Resolver):\n", " \"\"\" Provides recommendations from an Amazon Personalize campaign \"\"\"\n", " __personalize_runtime = boto3.client('personalize-runtime')\n", " \n", " ...\n", "\n", " def get_items(self, **kwargs):\n", " user_id = kwargs.get('user_id')\n", " item_id = kwargs.get('product_id')\n", "\n", " if user_id is None and item_id is None:\n", " raise Exception('user_id or product_id is required')\n", "\n", " params = {}\n", " \n", " is_recommender = self.inference_arn.split(':')[5].startswith('recommender/')\n", " if is_recommender:\n", " params['recommenderArn'] = self.inference_arn\n", " else:\n", " params['campaignArn'] = self.inference_arn\n", " \n", " if user_id is not None:\n", " params['userId'] = user_id\n", "\n", " if item_id is not None:\n", " params['itemId'] = item_id\n", "\n", " ...\n", "\n", " response = PersonalizeRecommendationsResolver.__personalize_runtime.get_recommendations(**params)\n", "\n", " return response['itemList']\n", "```\n", "\n", "The **PersonalizeRankingResolver** resolver implementation is very similar but it calls the [get_personalized_ranking](https://docs.aws.amazon.com/personalize/latest/dg/getting-real-time-recommendations.html#rankings) function on Amazon Personalize instead.\n", "\n", "#### HTTPResolver\n", "\n", "Finally, the [**HTTPResolver**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/resolvers.py) provides a generic example of how recommendations from an external recommendation service could be integrated into this experimentation framework. For instance, using this framework to evaluate, say, Amazon Personalize against an existing recommendation system.\n", "\n", "See the source repository for a complete listing of **HTTPResolver**.\n", "\n", "### Experiment Strategies\n", "\n", "Pulling together the concepts of variations and resolvers described above, there are three experimentation strategies implemented in this project. Each of the experimentation techniques are implemented as a subclass of the [**Experiment**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment.py) abstract base class.\n", "\n", "#### Experiment\n", "\n", "The [**Experiment**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment.py) class defines the contract required for all subclasses as well as provides some utility methods for creating correlation identifiers and tracking exposure and outcome events. An exposure represents a user being exposed to an experiment variation and an outcome represents a user interacting with an exposure item (i.e. clicking on a recommended product).\n", "\n", "Let's take a closer look at relevant parts of the **Experiment** base class.\n", "\n", "```python\n", "# from src/recommendations/src/recommendations-service/experimentation/experiment.py\n", "\n", "class Experiment(ABC):\n", " ...\n", " def __init__(self, table, **data):\n", " self._table = table\n", " self.id = data['id']\n", " self.feature = data['feature']\n", " self.name = data['name']\n", " self.status = data['status']\n", " self.type = data['type']\n", "\n", " self.variations = []\n", "\n", " for v in data['variations']:\n", " self.variations.append(Variation(**v))\n", " \n", " @abstractmethod\n", " def get_items(self, user_id, current_item_id = None, item_list = None, num_results = 10, tracker = None):\n", " \"\"\" For a given user, returns item recommendations for this experiment along with experiment tracking/correlation information \"\"\"\n", " pass\n", "\n", " def track_conversion(self, correlation_id):\n", " \"\"\" Call this method to track a conversion/outcome for an experiment \"\"\"\n", " correlation_bits = correlation_id.split('~')\n", " variation_index = int(correlation_bits[2])\n", "\n", " if variation_index < 0 or variation_index >= len(self.variations):\n", " raise Exception('variation_index is out of bounds')\n", "\n", " return self._increment_convert_count(variation_index)\n", "\n", " def _increment_exposure_count(self, variation, count = 1):\n", " \"\"\" Call this method when a user is exposed to a variation of an experiment \"\"\"\n", " return self.__increment_variation_count('exposures', variation, count)\n", "```\n", "\n", "The initialization/constructor method captures configuration information for the experiment. It also creates the **Variation** instances for the experiment that will be used to retrieve recommendations from each variation's resolver.\n", "\n", "The abstract `get_items()` method is implemented by each experiment subclass to return item recommendations according to its specific implementation. For example, for the A/B experiment the user will be assigned to the control or variation group and then receive recommendations from the corresponding resolver. \n", "\n", "Finally, exposure and conversion/outcome events are tracked using the `_increment_exposure_count()` and `track_conversion()` methods. You will see how these methods are used by each experiment type within their respective exercise notebooks.\n", "\n", "#### ABExperiment\n", "\n", "The [**A/B Experiment**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment_ab.py) class provides an A/B testing implementation such that given a `user_id` it will randomly and consistently assign a user to receive recommendations from one of two or more variations.\n", "\n", "#### InterleavingExperiment\n", "\n", "The [**Interleaving Experiment**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment_interleaving.py) class provides an implementation of interleaving or blending recommendations from two or more variations for each user. There are two interleaving method implementations provided. The **Balanced Method** balances selections from each variation in an alternating fashion. The **Team Draft Method** uses the analogy of team captains that are randomly selected for each round to select their next top \"player\" when building the interleaved results.\n", "\n", "#### MultiArmedBanditExperiment\n", "\n", "Finally, the [**Multi-Armed Bandit Experiment**](https://github.com/aws-samples/retail-demo-store/blob/master/src/recommendations/src/recommendations-service/experimentation/experiment_mab.py) class provides an implementation using a Bayesian technique to continually evaluate the performance of two or more variations such that the best performing variation is selected most often (exploitation) while the other variations are occasionally selected should their performance change over time (exploration). \n", "\n", "Each of these strategies will be discussed in detail with exercises in this workshop which are implemented as separate notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experiment Strategy Datastore\n", "\n", "As mentioned above, the experimentation framework described in this workshop is completely contained in the Retail Demo Store's Recommendation microservice. The microservice determines which experiments are active and their configuration by querying the experiment strategy table in DynamoDB. The table was automatically created by the Retail Demo Store CloudFormation templates. Since the table name is dynamically created at deployment time, the table name will be stored in and loaded from an AWS Systems Manager parameter.\n", "\n", "The Retail Demo Store supports running multiple experiments concurrently but for this module we will run an experiment for each technique sequentially to make it easier to interpret results. Experiment configurations are stored in a DynamoDB table where each item in the table represents an experiment and has the following fields.\n", "\n", "- **id** - Uniquely identifies an experiment (UUID).\n", "- **feature** - Identifies the Retail Demo Store feature where the experiment should be applied. Each testable feature in the UI has its own unique name. For example, recommended products panel on the home page has a feature name of `home_product_recs`. The related products panel on the product detail page has a feature name of `product_detail_related`. And so on. For this workshop we are running experiments for the product recommendations on the home page so the feature we'll be working with is `home_product_recs`. \n", "- **name** - The name of the experiment. The name is determined by the person running the experiment and should be short but descriptive. It will be used in the UI for demo purposes and when logging events for experiment outcome tracking.\n", "- **type** - The type of test (`ab` for an A/B test, `interleaving` for interleaved recommendations, or `mab` for multi-armed bandit test)\n", "- **status** - The status of the experiment (`ACTIVE`, `EXPIRED`, or `PENDING`). Only active experiments will be rendered in the UI.\n", "- **variations** - List of configurations representing variations applicable for the feature and type. For example, for A/B tests of the `home_product_recs` feature, the `variations` can be two Amazon Personalize campaign ARNs (variation type `personalize`) or a single Personalize campaign ARN and the default product behavior (variation type `product`). Really any combination of variations.\n", "\n", "Here is an example of an experiment strategy (i.e. DynamoDB item) for an A/B experiment that is testing the default product resolver (first variation or control group) and an Amazon Personalize campaign (second variation).\n", "\n", "```javascript\n", "{\n", " \"id\": \"17077add918e40e4ab1abacb860ea0ad\",\n", " \"feature\": \"home_product_recs\",\n", " \"name\": \"home_personalize_ab\",\n", " \"type\": \"ab\",\n", " \"status\": \"ACTIVE\",\n", " \"variations\": [\n", " {\n", " \"type\": \"product\"\n", " },\n", " {\n", " \"type\": \"personalize\",\n", " \"campaign_arn\": \"arn:aws:personalize:us-west-2:[redacted]:campaign/retaildemostore-product-personalization\"\n", " }\n", " ]\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Experiment Response\n", "\n", "Let's bring everything together by walking through how experiments for product recommendations are surfaced on the Retail Demo Store's home page.\n", "\n", "1. A user signs in to the Retail Demo Store's web site and browses to the home page.\n", "2. The home page makes an API request to the Retail Demo Store's [Recommendations](https://github.com/aws-samples/retail-demo-store/tree/master/src/recommendations) microservice (`/recommendations` endpoint) passing the current user's ID and the feature name of `home_product_recs`.\n", "3. The Recommendations service calls `ExperimentManager().get_active('home_product_recs')` to get the active **Experiment** for the `home_product_recs` feature.\n", "4. The Recommendations service calls `Experiment.get_items()` method passing the `user_id` to get the product recommendations for the user based on the currently active experiment.\n", "5. The Recommendations service returns the product recommendations response to the home page as JSON.\n", "6. The home page renders the product recommendations in the \"Inspired by your shopping trends\" section of the page.\n", "\n", "Below is an example of the response from the Recommendations service for an A/B experiment. The `itemId` is the ID for the recommended product and the `experiment` dictionary includes information on the experiment and variation. \n", "\n", "```javascript\n", "[\n", " {\n", " \"itemId\": \"2\",\n", " \"experiment\": {\n", " \"id\": \"d56619ad367f463688f506eefc0d08ca\",\n", " \"feature\": \"home_product_recs\",\n", " \"name\": \"home_personalize_ab\",\n", " \"type\": \"ab\",\n", " \"variationIndex\": 0,\n", " \"resultRank\": 1,\n", " \"correlationId\": \"d56619ad367f463688f506eefc0d08ca-42-0-1\"\n", " }\n", " },\n", " {\n", " \"itemId\": \"10\",\n", " \"experiment\": {\n", " \"id\": \"d56619ad367f463688f506eefc0d08ca\",\n", " \"feature\": \"home_product_recs\",\n", " \"name\": \"home_personalize_ab\",\n", " \"type\": \"ab\",\n", " \"variationIndex\": 0,\n", " \"resultRank\": 2,\n", " \"correlationId\": \"d56619ad367f463688f506eefc0d08ca-42-0-2\"\n", " }\n", " },\n", " {\n", " \"itemId\": \"18\",\n", " \"experiment\": {\n", " \"id\": \"d56619ad367f463688f506eefc0d08ca\",\n", " \"feature\": \"home_product_recs\",\n", " \"name\": \"home_personalize_ab\",\n", " \"type\": \"ab\",\n", " \"variationIndex\": 0,\n", " \"resultRank\": 3,\n", " \"correlationId\": \"d56619ad367f463688f506eefc0d08ca-42-0-3\"\n", " }\n", " }\n", "]\n", "```\n", "\n", "> As you can see, we're returning a lot of extra information on the experiment and variation for each recommended item. This is being done so that the Retail Demo Store UI can annotate recommended items on the page so we can visualize the effects of the experiments. In a production scenario, the `experiment.correlationId` is all that is needed to correlate user behavior with experiments." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## User Interface Treatment\n", "\n", "To make it easier to verify what's going on with your active experiments, the Retail Demo Store user interface will annotate content from experiments and variations received from the Recommendations service (see the red boxes in the following screenshot of the home page with an active experiement). While working through the exercises in this workshop you can open the Retail Demo Store's Web UI in another browser tab/window and verify that your experiment is active and working as expected. By default the Retail Demo Store UI displays information on active tests to make it easier to demo and debug. Of course, in a production deployment this information would be hidden from the user.\n", "\n", "> To find the URL of the Retail Demo Store Web UI for your deployment, browse to AWS CloudFormation in your AWS environment, find the stack for your Retail Demo Store deployment, click on Outputs, and locate the \"WebURL\" key. Open this URL in a separate browser tab/window.\n", "\n", "![Experiment UI Treatment](./images/ui-treatment.png)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next Steps\n", "\n", "With a foundational understanding of the experimentation framework now established, let's work through detailed exercises for each experiment type. Open the next notebook, **[3.2-AB-Experiment](./3.2-AB-Experiment.ipynb)**, in this workshop to step through running and A/B experiment." ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }