{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Retail Demo Store - Search Workshop\n", "\n", "Welcome to the Retail Demo Store Search Workshop. In this module we'll be configuring the Retail Demo Store Search service to allow searching for product data via [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) (formerly Amazon Elasticsearch Service). An Amazon OpenSearch domain should already be provisioned for you in your AWS environment as part of the Retail Demo Store deployment.\n", "\n", "Recommended Time: 20 Minutes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "To get started, we need to perform a bit of setup. Walk through each of the following steps to configure your environment to interact with the Amazon Personalize Service." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import dependencies and setup Boto3 python clients\n", "\n", "Througout this workshop we will need access to some common libraries and clients for connecting to AWS services." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import Dependencies\n", "\n", "import sys\n", "import boto3\n", "import botocore\n", "import json\n", "import pandas as pd\n", "import requests\n", "\n", "from packaging import version\n", "from random import randint\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will create the clients for the AWS services needed in this workshop." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup Clients\n", "\n", "servicediscovery = boto3.client('servicediscovery')\n", "ssm = boto3.client('ssm')\n", "opensearch_service = boto3.client('opensearch')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create index and bulk index product data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get Products Service instance\n", "\n", "We will be creating a new OpenSearch Index and indexing our product data so that our users can search for products. To do this, first we will be pulling our Product data from [Products Service](https://github.com/aws-samples/retail-demo-store/tree/master/src/products) that is deployed as part of the Retail Demo Store. To connect to the Products Service we will use Service Discovery to discover an instance of the Products Service, and then connect directly to that service instances to access our data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = servicediscovery.discover_instances(\n", " NamespaceName='retaildemostore.local',\n", " ServiceName='products',\n", " MaxResults=1,\n", " HealthStatus='HEALTHY'\n", ")\n", "\n", "assert len(response['Instances']) > 0, 'Products service instance not found; check ECS to ensure it launched cleanly'\n", "\n", "products_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']\n", "print('Service Instance IP: {}'.format(products_service_instance))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Download and explore the Products dataset\n", "\n", "Now that we have the IP address of one of our Products Service instances, we can connect to it and fetch our product catalog. To more easily explore our data, we will convert the json response form the Products Service into a Pandas dataframe and print it as a table. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = requests.get('http://{}/products/all'.format(products_service_instance))\n", "products = response.json()\n", "products_df = pd.DataFrame(products)\n", "pd.set_option('display.max_rows', 5)\n", "\n", "products_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install OpenSearch python library\n", "\n", "We will use the Python OpenSearch library to connect to our Amazon OpenSearch cluster, create a new index, and then bulk index our product data. First, we need to install the OpenSearch library into the local notebook environment. We'll ensure pip is updated as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!{sys.executable} -m pip install --upgrade pip\n", "!{sys.executable} -m pip install opensearch-py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Discover OpenSearch domain endpoint\n", "\n", "Before we can configure the OpenSearch client, we need to determine the endpoint for the OpenSearch domain created in your AWS environment. We will accomplish this by looking for the OpenSearch domain with tag key of `Name` and tag value of `retaildemostore`. This tag was associated with the Amazon OpenSearch domain that was created when the project was deployed to your AWS account using CloudFormation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "opensearch_domain_endpoint = None\n", "\n", "domains_response = opensearch_service.list_domain_names()\n", "\n", "for domain_name in domains_response['DomainNames']:\n", " describe_response = opensearch_service.describe_domain(\n", " DomainName=domain_name['DomainName']\n", " )\n", " \n", " tags_response = opensearch_service.list_tags(ARN=describe_response['DomainStatus']['ARN'])\n", "\n", " domain_match = False\n", " for tag in tags_response['TagList']:\n", " if tag['Key'] == 'Name' and tag['Value'] == 'retaildemostore':\n", " domain_match = True\n", " break\n", " \n", " if domain_match:\n", " opensearch_domain_endpoint = describe_response['DomainStatus']['Endpoints']['vpc']\n", " break\n", "\n", "print('OpenSearch domain endpoint: ' + str(opensearch_domain_endpoint))\n", "\n", "assert opensearch_domain_endpoint, 'OpenSearch domain endpoint could not be determined. Ensure Amazon OpenSearch domain has been successfully created and has \"retaildemostore\" tag before continuing.'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure and create OpenSearch python client" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from opensearchpy import OpenSearch\n", "\n", "SEARCH_HOST = {\n", " 'host' : opensearch_domain_endpoint,\n", " 'port' : 443,\n", " 'scheme' : 'https',\n", "}\n", "\n", "client = OpenSearch(hosts = [SEARCH_HOST])\n", "\n", "# These variables will be used throughout the rest of the notebook\n", "INDEX_NAME = 'products'\n", "ID_FIELD = 'id'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare Product data for indexing\n", "\n", "Batch products into chunks that will be used for batch indexing below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bulk_datas = [] \n", "bulk_data = []\n", "\n", "bulk_datas.append(bulk_data)\n", "\n", "max_data_len = 100\n", "\n", "for product in products:\n", " data_dict = product\n", "\n", " op_dict = {\n", " \"index\": {\n", " \"_index\": INDEX_NAME, \n", " \"_id\": data_dict[ID_FIELD]\n", " }\n", " }\n", " bulk_data.append(op_dict)\n", " bulk_data.append(data_dict)\n", " \n", " if len(bulk_data) >= max_data_len:\n", " bulk_data = []\n", " bulk_datas.append(bulk_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check for and delete existing indexes\n", "\n", "If the products index already exists, we'll delete it so everything gets rebuilt from scratch." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if client.indices.exists(INDEX_NAME):\n", " print(\"Deleting '%s' index...\" % (INDEX_NAME))\n", " res = client.indices.delete(index = INDEX_NAME)\n", " print(\" response: '%s'\" % (res))\n", "else:\n", " print('Index does not exist. Nothing to delete.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create index" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "request_body = {\n", " \"settings\" : {\n", " \"number_of_shards\": 1,\n", " \"number_of_replicas\": 0\n", " }\n", "}\n", "print(\"Creating '%s' index...\" % (INDEX_NAME))\n", "res = client.indices.create(index = INDEX_NAME, body = request_body)\n", "print(\" response: '%s'\" % (res))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Perform bulk indexing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Bulk indexing...\")\n", "for bulk_data in bulk_datas:\n", " res = client.bulk(index = INDEX_NAME, body = bulk_data, refresh = True)\n", " \n", "print(\"Done\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Validate results through OpenSearch\n", "\n", "To verify that the products have been successfully indexed, let's perform a wildcard search for `brush*` directly against the OpenSearch index." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = client.search(index = INDEX_NAME, body={\"query\": {\"wildcard\": { \"name\": \"brush*\"}}})\n", "print(json.dumps(res, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validate results through Search Service\n", "\n", "Finally, let's verify that the Retail Demo Store's [Search service](https://github.com/aws-samples/retail-demo-store/tree/master/src/search) can successfully query the the OpenSearch index as well.\n", "\n", "### Discover Search Service\n", "\n", "First we need to get the address to the [Search service](https://github.com/aws-samples/retail-demo-store/tree/master/src/search)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = servicediscovery.discover_instances(\n", " NamespaceName='retaildemostore.local',\n", " ServiceName='search',\n", " MaxResults=1,\n", " HealthStatus='HEALTHY'\n", ")\n", "\n", "assert len(response['Instances']) > 0, 'Search service instance not found; check ECS to ensure it launched cleanly'\n", "\n", "search_service_instance = response['Instances'][0]['Attributes']['AWS_INSTANCE_IPV4']\n", "print('Service Instance IP: {}'.format(search_service_instance))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Call Search Service\n", "\n", "Let's call the service's index page which simply echos the service name." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!curl {search_service_instance}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's do the same `brush` search through the Search service. We should get back the same item IDs as the direct OpenSearch query above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!curl {search_service_instance}/search/products?searchTerm='brush'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Workshop complete\n", "\n", "**Congratulations!** You have completed the first Retail Demo Store workshop where we indexed the products from the Retail Demo Store's Products microservice in an OpenSearch domain index. This domain is used by the Retail Demo Store's Search microservice to process search queries from the Web user interface. To see this in action, open the Retail Demo Store's web UI in a new browser tab/window and enter a value in the search field at the top of the page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Next step\n", "\n", "Move on to the **[1-Personalization](../1-Personalization/Lab-1-Introduction-and-data-preparation.ipynb)** workshop where we will learn how to train machine learning models using Amazon Personalize to produce personalized product recommendations to users and add the ability to provide personalized reranking of products." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "vscode": { "interpreter": { "hash": "d2ca6edb7b84bab06ec39f802df7b8f7871770e31471df2cbe4279e0e7265b83" } } }, "nbformat": 4, "nbformat_minor": 4 }