{ "cells": [ { "cell_type": "markdown", "id": "adca95b1", "metadata": {}, "source": [ "# Connecting to Forecast Services via Virtual Private Cloud (VPC) Endpoints\n", "\n", "\n", "We are excited to announce that Amazon Forecast is now integrated as an AWS service within AWS PrivateLink. Through this integration, you can now easily provision Amazon Forecast interface endpoints within your own Virtual Private Cloud (VPC) and create connections to Amazon Forecast without needing access to the public internet. This is beneficial to customers with strict security or network requirement that restricts them from sending data over public internet.\n", "\n", "In addition, you can access additional access controls for services with VPC endpoints through usage of VPC endpoint policies, which will not override or replace the calling identity's IAM permissions, but be enforced in conjunction with the IAM permissions to allow additional control for the VPC owner. For example, if you want to only allow `ListDatasets` action when users within the VPC wants to hit Forecast service through the VPC endpoint, you can define a VPC endpoint policy such that only `ListDatasets` operation is allowed for all connections through the VPC endpoint, without the need of restricting each user's IAM permissions to do so. \n", "\n", "With today's launch, we are offering the following endpoint services for Amazon Forecast\n", "\n", "1. A VPC endpoint service to use with Amazon Forecast operations. For most users, this is the most suitable type of VPC endpoint service to establish endpoint connections to.\n", " - `com.amazonaws..forecast`\n", " - `com.amazonaws..forecastquery`\n", "1. A VPC endpoint service for Amazon Forecast operations with endpoints that comply with the Federal Information Processing Standard (FIPS) Publication 140-2 US government standard (availble in select regions only, see https://docs.aws.amazon.com/general/latest/gr/forecast.html for regions with support for FIPS endpoints)\n", " - `com.amazonaws..forecast-fips`\n", " - `com.amazonaws..forecastquery-fips`\n", "\n", "In this guide, we provide a step-by-step guide on how to connect to Amazon Forecast via VPC endpoints. First, we will cover some terminology.\n", "\n", "\n", "## Terminology\n", "\n", "The following terminology is helpful for readers to understand what VPC related components are, and will aid reader with understanding the rest of the guide.\n", "\n", "- **Private DNS**: This is a feature offered by VPC and AWS PrivateLink to allow users to easily provision changes to your own Route53 Private Hosted Zones to connect to an internal service or PrivateLink based services, including Amazon Forecast's PrivateLink enabled services. During this guide, we will showcase how to enable private DNS such that clients minimize code changes and still be able to call Forecast through VPC. More details about PrivateDNS at https://aws.amazon.com/about-aws/whats-new/2020/01/aws-privatelink-supports-private-dns-names-internal-3rd-party-services/\n", "- **VPC**: A VPC (Virtual Private Cloud) is a logically isolated virtual network that allows you control the virtual networking environment. See https://aws.amazon.com/vpc/ for more details.\n", "- **VPC Endpoint**: A VPC endpoint enables connections between a virtual private cloud (VPC) and supported services, without requiring that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. We will be creating VPC endpoints to Amazon Forecast services within this guide. For more details on VPC endpoints, see https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html\n", "- **VPC Endpoint Policy**: A VPC endpoint policy is an IAM resource policy that you can attach to a VPC endpoint, and it allows you to control access to the service that you're connecting to. We will be exploring usage of the VPC endpoint policies in this guide. Fore more details on VPC endpoint policies, see https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-access.html. You can also see list of sample policies at https://docs.aws.amazon.com/forecast/latest/dg/vpc-interface-endpoints.html\n", "\n", "## Overview of Guide\n", "\n", "This guide is splitted into 2 parts.\n", "\n", "Part 1 of this guide aims to walk the user through the VPC endpoint creation process, as well as setting up related infrastructure for testing purpose. Towards the end of this guide, you will have VPC endpoints available for connecting to Amazon Forecast services via VPC. You can use these VPC endpoints in various ways such as within Lambdas, EC2s, a SageMaker notebook, etc. \n", "\n", "Part 2 of this guide is available in Jupyter Notebook at https://github.com/aws-samples/amazon-forecast-samples/tree/main/notebooks/advanced/VPC_PrivateLink/Connect_Via_VPC_Endpoint_Guide.ipynb. The notebook will help reader understand how to configure AWS clients to connect to the VPC endpoints, what VPC endpoint policy can control, and how to verify that a call is going through VPC endpoint. Note that although we are connecting to Forecast through VPC via Jupyter notebook, you can do similar thing with compute resources inside a VPC such as Lambda, EC2, etc. The general concept of the notebook should still be applicable\n", "\n", "Disclaimers:\n", "- The configuration values used for creation of resources within this guide are for example purpose only. We strongly encourage you to read through the AWS docs to choose the right configuration values for production usage.\n", "- Note that the screenshots used in the notebook may be slightly different depending on the console version, but the general instructions should still be applicable.\n" ] }, { "cell_type": "markdown", "id": "7cddb204", "metadata": {}, "source": [ "## Part 1: VPC Related Setup \n", "\n", "We will be creating some VPC related resources such as VPC, and VPC endpoints to test VPC connections to Amazon Forecast's PrivateLink enabled services. The configuration values used for creation of these resources are for this sample guide purpose only. We strongly encourage you to read through the AWS docs to choose the right configuration values for production usage. If you have existing VPC/subnet/security groups you wish to use, feel free to do so as well.\n", "\n", "We will also be creating resources in `us-west-2` region. But feel free to create the resources in any Forecast supported regions of your choosing, just be sure to use consistent region throughout the guide" ] }, { "cell_type": "markdown", "id": "5ff8ac62", "metadata": {}, "source": [ "### Create VPC\n", "\n", "1. Go to AWS Console\n", "1. Select the appropriate region on the top right (`us-west-2` for purpose of this guide)\n", "1. Head to `VPC` console page\n", "1. Click on `Your VPCs` on the left navigation panel\n", "1. Click on `Create VPC`\n", "1. For the purpose of the guide, we will be using the sample configuration values for the VPC below\n", " - Populate `Name tag` field with `private-link-test-vpc`\n", " - Choose `IPv4 CIDR maunal input` and populate CIDR block with `10.0.0.0/24`\n", " - Choose `No IPv6 CIDR block`\n", " - Choose Tenancy as `Default`\n", " - Your screen should look similar to ![Create VPC](./images/CreateVPC.png)\n", " - Click `Create VPC`\n", "1. Wait till your VPC is in `available` state\n", "1. The following step is only required if we are planning to enable PrivateDNS, which we do plan to do in this guide. Select the VPC you just created\n", " - Make sure `DNS hostnames` shows `Enabled`. If not, click `Actions`, click `Edit DNS hostnames`, make sure to check `Enable`, and click `Save changes`\n", " - Make sure `DNS resolution` shows `Enabled`. If not, click `Actions`, click `Edit DNS resolution`, make sure to check `Enable` and click `Save changes`" ] }, { "cell_type": "markdown", "id": "1cf03752", "metadata": {}, "source": [ "### Create Subnet\n", "\n", "1. Head back to `VPC` console page\n", "1. Click on `Subnets` on the left navigation panel\n", "1. Click on `Create subnet`\n", "1. For the purpose of this guide, we will be using the sample configuration values for the subnet below\n", " - Select the VPC tagged with `private-link-test-vpc` as the VPC to use under VPC ID\n", " - Populate `private-link-test-subnet-01` under `Subnet name`\n", " - Choose `us-west-2a` as the `Availability Zone`\n", " - Populate `10.0.0.0/24` as the `IPv4 CIDR block`\n", " - Your screen should look similar to ![Create Subnet](./images/CreateSubnet.png)\n", " - Click `Create subnet`\n", "1. Wait till the subnet is in `available` state\n", "1. Note that best practice is usually to create multiple subnets under multiple AZs, but for purpose of this guide, we will just be creating one subnet" ] }, { "cell_type": "markdown", "id": "5c2ceb63", "metadata": {}, "source": [ "### Create Security Group\n", "\n", "1. Head back to `VPC` console page\n", "1. Click on `Security Groups` on the left navigation panel\n", "1. Click on `Create security group`\n", "1. For the purpose of the guide, we will be using the sample configuration values for the security group below\n", " - Populate `private-link-test-securty-group` as the `Security group name`\n", " - Populate `Allow inbound and outbound HTTPS connections for private link testing purpose` in the `Description`\n", " - Be sure to select the right VPC ID corresponding to VPC tagged with `private-link-test-vpc`\n", " - Click `Add rule` under `Inbound rules`\n", " - Select `HTTPS` as `Type`\n", " - Select `Anywhere-IPv4` as the `Source`\n", " - Modify the rule under `Outbound rules`\n", " - Select `HTTPS` as `Type`\n", " - Select `Anywhere-IPv4` as the `Source`\n", " - Your screen should look similar to ![Create Security Group](./images/CreateSecurityGroup.png)\n", " - Click `Create security group`" ] }, { "cell_type": "markdown", "id": "06f852e4", "metadata": {}, "source": [ "### Create VPC endpoints\n", "\n", "As of January 2022, you can create two types of Amazon VPC endpoints to use with Amazon Forecast:\n", "\n", "1. A VPC endpoint to use with Amazon Forecast operations. For most users, this is the most suitable type of VPC endpoint.\n", " - `com.amazonaws..forecast`\n", " - `com.amazonaws..forecastquery`\n", "1. A VPC endpoint for Amazon Forecast operations with endpoints that comply with the Federal Information Processing Standard (FIPS) Publication 140-2 US government standard (availble in select regions only, see https://docs.aws.amazon.com/general/latest/gr/forecast.html for regions with support for FIPS endpoints)\n", " - `com.amazonaws..forecast-fips`\n", " - `com.amazonaws..forecastquery-fips`\n", " \n", "For the purpose of this guide, we will choose the most common and standard regular endpoints instead of the FIPS endpoints.\n", "\n", "In the below steps, we will be creating the VPC endpoints that will be used for connecting traffic destined for the VPC to the Amazon Forecast services without going through the public internet.\n", "\n", "1. Head back to `VPC` console page\n", "1. Click on `Endpoints` on the left navigation panel\n", "1. Click on `Create endpoint`\n", "1. Create endpoint for `com.amazonaws.us-west-2.forecast` with values below\n", " - Select `AWS services` under `Service Category`\n", " - Under `Services`, populate the `Filter services` textbox with search term `forecast`\n", " - Select `com.amazonaws.us-west-2.forecast`\n", " - Under `VPC`, select the VPC ID corresponding to VPC tagged with `private-link-test-vpc`\n", " - Under `VPC`, expand the additional settings, be sure to uncheck `Enable DNS` name for now. This is to disable privateDNS support for now, we will enable it in later section of the guide\n", " - Under `Subnet`, select `us-west-2a` as the Availability Zone, and select the subnet id corresponding to `private-link-test-subnet-01`\n", " - Under `Security Groups`, select security group corresponding to `private-link-test-securty-group`\n", " - Under `Policy`, select `Custom` and add the following policy (we will change it to Full Access in later section):\n", " ```json\n", " {\n", " \"Statement\": [\n", " {\n", " \"Principal\": \"*\",\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"forecast:ListDatasets\"\n", " ],\n", " \"Resource\": \"*\"\n", " }\n", " ]\n", " }\n", " ```\n", " - This custom policy will allow connections through the VPC endpoint with ListDatasets action. All other actions are not allowed.\n", " - We will change it to Full Access in later section of this guide\n", " - Your screen should look similar to ![Create Forecast Endpoint 1](./images/CreateForecastEndpoint1.png) and ![Create Forecast Endpoint 2](./images/CreateForecastEndpoint2.png)\n", " - Click `Create endpoint`\n", " - Select the endpoint you just created, and wait to use it until the status changes to `Available`\n", "1. Repeat above steps but create endpoint for `com.amazonaws.us-west-2.forecast-query` with differences below\n", " - Select `com.amazonaws.us-west-2.forecast-query` as the service\n", " - Leave `Enable DNS` name checked to enable PrivateDNS support\n", " - Select `Full access` under `Policy` to allow Full access\n", " - Your screen should look similar to ![Create Forecast Query Endpoint](./images/CreateForecastQueryEndpoint.png)\n", " - Click `Create endpoint`\n", " - Select the endpoint you just created, and wait to use it until the status changes to `Available`" ] }, { "cell_type": "markdown", "id": "5bf6aa38", "metadata": {}, "source": [ "## Part 2: Demo Usage of VPC endpoints via Notebook\n", "\n", "The VPC endpoints we created in steps above are now available for your usage. You can use these VPC endpoints in various ways such as within Lambdas, EC2s, a SageMaker notebook, etc. \n", "\n", "For the purpose of the guide, we have prepared a Jupyter Notebook to demonstrate the second part of this guide at https://github.com/aws-samples/amazon-forecast-samples/tree/main/notebooks/advanced/VPC_PrivateLink/Connect_Via_VPC_Endpoint_Guide.ipynb. The notebook will help reader understand how to configure AWS clients to connect to the VPC endpoints, what VPC endpoint policy can control, and how to verify that a call is going through VPC endpoint. Note that although we are connecting to Forecast through VPC via Jupyter notebook, you can do similar thing with compute resources inside a VPC such as Lambda, EC2, etc. The general concept of the notebook should still be applicable.\n", "\n", "Please continue the guide through the Jupyter notebook" ] }, { "cell_type": "markdown", "id": "e1c1263d", "metadata": {}, "source": [ "### Notebook Related Setup \n", "\n", "#### Create Notebook Instance within VPC\n", "We will be creating a new notebook instance within a VPC. Be sure to relaunch this jupyter notebook within the newly launched notebook instance afterwards\n", "\n", "1. Go to AWS Console\n", "1. Select the appropriate region on the top right (`us-west-2` for purpose of this guide)\n", "1. Head to `Amazon SageMaker` console page\n", "1. Clck `Notebook instances` under `Notebook` on left navigation panel\n", "1. Click `Create notebook instance`\n", "1. For the purpose of the guide, we will be using the sample configuration values for the security group below\n", " - Populate `private-link-test-notebook` under `Notebook instance name`\n", " - Select `ml.t2.medium` as `Notebook instance type`\n", " - Select `notebook-al2-v1` as the `Platform identifier`\n", " - Under IAM role, feel free to create role or reusing existing role, and restrict access if needed. For guide purpose, will proceed with `Any S3 bucket access`.\n", " - Expand `Network` tab\n", " - Select VPC ID corresponding to VPC tagged with `private-link-test-vpc`\n", " - Select Subnet corresponding to subnet tagged with `private-link-test-subnet-01`\n", " - Select Security Group corresponding to security group tagged with `private-link-test-security-group`\n", " - Select `Enable — Access the internet directly through Amazon SageMaker` since within notebook, we will need access to the git hub repository, S3, IAM, etc. Be sure to look at https://docs.aws.amazon.com/sagemaker/latest/dg/appendix-notebook-and-internet-access.html if you want to choose `Disable` to restrict the notebook to connect through VPC only. For purpose of this guide, we will choose `Enable` option.\n", " - Expand `Git repositories`\n", " - Select `Clone a public Git repository to this notebook isntance only`\n", " - Populate `https://github.com/aws-samples/amazon-forecast-samples.git` under `Git repository URL` to copy contents of https://github.com/aws-samples/amazon-forecast-samples\n", " - Your screen should look similar to ![Create Notebook Instance](./images/CreateNotebookInstance.png)\n", " - Click `Create notebook instance`\n", "1. Wait for the notebook to be created and status updates to `InService`. If status is at `Stopped`, just select it, click `Actions`, and click `Start`\n", "1. Click Open Jupyter\n", "1. Navigate to this notebook under `notebooks -> advanced -> VPC_PrivateLink -> Connect_Via_VPC_Guide.ipynb`\n", "1. Be sure you have relaunched this notebook under the Notebook instance created above\n" ] }, { "cell_type": "markdown", "id": "bd1346b1", "metadata": {}, "source": [ "#### Setup the IAM permission for the SageMaker Role\n", "1. Go to IAM console\n", "1. Select the SageMaker role that was created above\n", "1. Click `Add in-line policy` to attach a new in-line policy with the below JSON. Be sure to replace `` with your AWS account ID. You can name the in-line policy with something like `test-private-link-forecast-and-iam-policy`\n", "```json\n", "{\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"forecast:*\"\n", " ],\n", " \"Resource\": \"*\"\n", " },\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"iam:PassRole\"\n", " ],\n", " \"Resource\": \"arn:aws:iam:::role/ForecastNotebookRole-Basic\",\n", " \"Condition\": {\n", " \"StringEquals\": {\n", " \"iam:PassedToService\": \"forecast.amazonaws.com\"\n", " }\n", " }\n", " },\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"iam:CreateRole\",\n", " \"iam:GetRole\"\n", " ],\n", " \"Resource\": \"arn:aws:iam:::role/ForecastNotebookRole-Basic\"\n", " }\n", " ]\n", "}\n", "```" ] }, { "cell_type": "markdown", "id": "291f9e7d", "metadata": {}, "source": [ "### Familiarize Yourself with Forecast Quick Start Guide \n", "This notebook assumes you have walked through the Forecast Quick Start Guide at https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/basic/Getting_Started/Amazon_Forecast_Quick_Start_Guide.ipynb and understand the general Forecast creation process. The Quick Start Guide notebook will walkthrough the detailed dataset creation, dataset importing, training of predictor, and generation of Forecast, which we will assume you have seen already through the rest of this guide.\n", "\n", "If you have not done so, please go through the Quick Start Guide" ] }, { "cell_type": "markdown", "id": "fdd6a2fb", "metadata": {}, "source": [ "### Environment Setup " ] }, { "cell_type": "markdown", "id": "3b0d9cc7", "metadata": {}, "source": [ "#### Install necessary libraries (execute if necessary, such as if imports below fails)" ] }, { "cell_type": "code", "execution_count": 11, "id": "6e681b5a", "metadata": {}, "outputs": [], "source": [ "# %%capture --no-stderr setup\n", "\n", "# !pip install pandas s3fs matplotlib ipywidgets\n", "# !pip install boto3 --upgrade\n", "# !pip install tqdm\n", "\n", "# %reload_ext autoreload" ] }, { "cell_type": "markdown", "id": "da92d56d", "metadata": {}, "source": [ "#### Setup Imports" ] }, { "cell_type": "code", "execution_count": 2, "id": "df46168c", "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "import shutil\n", "import datetime\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import boto3\n", "\n", "# importing forecast notebook utility from notebooks/common directory\n", "sys.path.insert( 0, os.path.abspath(\"../../common\") )\n", "import util\n", "\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline \n", "plt.rcParams['figure.figsize'] = (15.0, 5.0)" ] }, { "cell_type": "markdown", "id": "cb8ba788", "metadata": {}, "source": [ "#### Create a session within specified region" ] }, { "cell_type": "code", "execution_count": 4, "id": "4dc55b2a", "metadata": {}, "outputs": [], "source": [ "region = 'us-west-2'\n", "session = boto3.Session(region_name=region)" ] }, { "cell_type": "markdown", "id": "b98c8eb1", "metadata": {}, "source": [ "#### Setup IAM Role used by Amazon Forecast to access your data" ] }, { "cell_type": "code", "execution_count": 12, "id": "5d74182e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating Role ForecastNotebookRole-Basic...\n", "The role ForecastNotebookRole-Basic already exists, skipping creation\n", "Done.\n", "Success! Created role = ForecastNotebookRole-Basic\n" ] } ], "source": [ "role_name = \"ForecastNotebookRole-Basic\"\n", "print(f\"Creating Role {role_name}...\")\n", "role_arn = util.get_or_create_iam_role( role_name = role_name )\n", "\n", "# echo user inputs without account\n", "print(f\"Success! Created role = {role_arn.split('/')[1]}\")" ] }, { "cell_type": "markdown", "id": "0755e2a8", "metadata": {}, "source": [ "#### Upload S3 file under unique bucket name" ] }, { "cell_type": "code", "execution_count": 6, "id": "71e067a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampitem_idtarget_value
02017-12-01 00:00:00427
12017-12-01 00:00:00736
22017-12-01 00:00:00102
\n", "
" ], "text/plain": [ " timestamp item_id target_value\n", "0 2017-12-01 00:00:00 4 27\n", "1 2017-12-01 00:00:00 7 36\n", "2 2017-12-01 00:00:00 10 2" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Enter S3 bucket name for uploading the data and hit ENTER key:private-link-test-s3-20220127\n", "\n", "Attempting to upload the data to the S3 bucket 'private-link-test-s3-20220127' at key 'data/taxi-dec2017-jan2019.csv' ...\n", "\n", "Done, the dataset is uploaded to S3 at s3://private-link-test-s3-20220127/data/taxi-dec2017-jan2019.csv.\n" ] } ], "source": [ "key=\"data/taxi-dec2017-jan2019.csv\"\n", "\n", "taxi_df = pd.read_csv(key, dtype = object, names=['timestamp','item_id','target_value'])\n", "\n", "display(taxi_df.head(3))\n", "\n", "bucket_name = input(\"\\nEnter S3 bucket name for uploading the data and hit ENTER key:\")\n", "print(f\"\\nAttempting to upload the data to the S3 bucket '{bucket_name}' at key '{key}' ...\")\n", "\n", "s3 = session.resource('s3')\n", "bucket = s3.Bucket(bucket_name)\n", "if not bucket.creation_date:\n", " if region != \"us-east-1\":\n", " s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region})\n", " else:\n", " s3.create_bucket(Bucket=bucket_name)\n", "\n", "s3.Bucket(bucket_name).Object(key).upload_file(key)\n", "ts_s3_path = f\"s3://{bucket_name}/{key}\"\n", "\n", "print(f\"\\nDone, the dataset is uploaded to S3 at {ts_s3_path}.\")" ] }, { "cell_type": "markdown", "id": "96dedce8", "metadata": {}, "source": [ "### Testing Forecast APIs through VPC \n", "\n", "We will walk through testing of Forecast APIs through the VPC. Be sure to have the VPC setups in Part 1 completed." ] }, { "cell_type": "markdown", "id": "8fee1ce9", "metadata": {}, "source": [ "#### Testing existing public DNS without VPC\n", "\n", "Since we have not turned on privateDNS for the `com.amazonaws.us-west-2.forecast` VPC endpoint. The action below succceeds because we're using a sageMaker with internet access, and the connection is going through the non-VPC public DNS `forecast.us-west-2.amazonaws.com` to make connection to Forecast service" ] }, { "cell_type": "code", "execution_count": 63, "id": "496ae661", "metadata": {}, "outputs": [], "source": [ "forecast = session.client(service_name='forecast')" ] }, { "cell_type": "code", "execution_count": 64, "id": "fea4daa2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Predictors': [{'PredictorArn': 'arn:aws:forecast:us-west-2:282494629315:predictor/TAXI_PREDICTOR_01FTHF1KK7E6VVR676WVN24MXC',\n", " 'PredictorName': 'TAXI_PREDICTOR',\n", " 'DatasetGroupArn': 'arn:aws:forecast:us-west-2:282494629315:dataset-group/TAXI_DEMO_PRIVATE_LINK',\n", " 'IsAutoPredictor': True,\n", " 'Status': 'ACTIVE',\n", " 'CreationTime': datetime.datetime(2022, 1, 28, 23, 6, 37, 563000, tzinfo=tzlocal()),\n", " 'LastModificationTime': datetime.datetime(2022, 1, 31, 18, 42, 0, 881000, tzinfo=tzlocal())}],\n", " 'ResponseMetadata': {'RequestId': '60fb2632-5fe1-405a-b322-89de9f20c586',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'date': 'Mon, 31 Jan 2022 19:56:41 GMT',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '370',\n", " 'connection': 'keep-alive',\n", " 'x-amzn-requestid': '60fb2632-5fe1-405a-b322-89de9f20c586'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.list_predictors()" ] }, { "cell_type": "markdown", "id": "1fa3bc83", "metadata": {}, "source": [ "#### Testing new VPC specific endpoints\n", "\n", "We will now start testing connection through a VPC endpoint.\n", "\n", "Because we didn't enable privateDNS for the `com.amazonaws.us-west-2.forecast` VPC endpoint, we will have to go find the VPC specific DNS that the PrivateLink generated for us. You can find it by going to the VPC console, and clicking the VPC endpoint we just created for `com.amazonaws.us-west-2.forecast`. Locate the DNS under section similar to ![VPC specific DNS](./images/VpcSpecificDNS.png)\n", "\n", "The DNS is VPC endpoint specific, so the one in this guide will not be completely same as yours, but it should be similar. \n", "\n", "Populate the DNS below with prefix `https://`" ] }, { "cell_type": "code", "execution_count": 17, "id": "895ee26e", "metadata": {}, "outputs": [], "source": [ "# With non-private VPC DNS\n", "forecast_dns = \"https://vpce-0e328db54a7a149fa-j82j07dz.forecast.us-west-2.vpce.amazonaws.com\";" ] }, { "cell_type": "code", "execution_count": 20, "id": "7532473d", "metadata": {}, "outputs": [], "source": [ "forecast = session.client(service_name='forecast',\n", " region_name=region,\n", " endpoint_url=forecast_dns\n", ")" ] }, { "cell_type": "markdown", "id": "ff4fcba6", "metadata": {}, "source": [ "**Explain**\n", "\n", "The call below is expected to fail because the VPCE policy that we defined above only allows `ListDatasets` action. However, since we know VPCE policy is only attached to the VPC endpoint, we can assure ourselves that the call is going through the VPC endpoint. What we learned is a simple way of checking that we are making network calls through the VPC endpoint" ] }, { "cell_type": "code", "execution_count": 28, "id": "b5e52514", "metadata": {}, "outputs": [ { "ename": "ClientError", "evalue": "An error occurred (AccessDeniedException) when calling the ListPredictors operation: User: arn:aws:sts::282494629315:assumed-role/AmazonSageMaker-ExecutionRole-20220127T153361/SageMaker is not authorized to perform: forecast:ListPredictors because no VPC endpoint policy allows the forecast:ListPredictors action", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mClientError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mforecast\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlist_predictors\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_api_call\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 389\u001b[0m \"%s() only accepts keyword arguments.\" % py_operation_name)\n\u001b[1;32m 390\u001b[0m \u001b[0;31m# The \"self\" in this scope is referring to the BaseClient.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 391\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_api_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moperation_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 392\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 393\u001b[0m \u001b[0m_api_call\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpy_operation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_make_api_call\u001b[0;34m(self, operation_name, api_params)\u001b[0m\n\u001b[1;32m 717\u001b[0m \u001b[0merror_code\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Error\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Code\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 718\u001b[0m \u001b[0merror_class\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexceptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_code\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merror_code\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 719\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0merror_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed_response\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 720\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 721\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mClientError\u001b[0m: An error occurred (AccessDeniedException) when calling the ListPredictors operation: User: arn:aws:sts::282494629315:assumed-role/AmazonSageMaker-ExecutionRole-20220127T153361/SageMaker is not authorized to perform: forecast:ListPredictors because no VPC endpoint policy allows the forecast:ListPredictors action" ] } ], "source": [ "forecast.list_predictors()" ] }, { "cell_type": "markdown", "id": "7eb4ad87", "metadata": {}, "source": [ "We can do a `ListDatasets` action to verify that we are able to make successful through the VPC endpoint" ] }, { "cell_type": "code", "execution_count": 29, "id": "fb98375c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Datasets': [],\n", " 'ResponseMetadata': {'RequestId': '98113a60-5d94-4b1e-abb0-67d46c771d17',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'date': 'Fri, 28 Jan 2022 22:38:04 GMT',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '15',\n", " 'connection': 'keep-alive',\n", " 'x-amzn-requestid': '98113a60-5d94-4b1e-abb0-67d46c771d17'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.list_datasets()" ] }, { "cell_type": "markdown", "id": "4dc7bcc1", "metadata": {}, "source": [ "#### Testing VPC endpoint through PrivateDNS\n", "\n", "Now we can test private DNS as well by enabling private DNS for `com.amazonaws.us-west-2.forecast` as well\n", "\n", "1. Go to VPC console page\n", "1. Click `Endpoints` on the left navigation panel\n", "1. Select the VPC endpoint we created for `com.amazonaws.us-west-2.forecast`\n", "1. Click `Actions` -> `Modify private DNS name`\n", "1. Checkmark `Enable for this endpoint`\n", "1. Click `Save changes`\n", "1. You can wait till status becomes `Available`, and you should see the private DNS name listed under `Private DNS names`, with something similar to `forecast..us-west-2.amazonaws.com`\n", "\n", "Note that the private DNS name is simliar to the public DNS name for our public endpoints at https://alpha-docs-aws.amazon.com/general/latest/gr/forecast.html#forecast_region. This means that for existing clients already calling us, if you create the VPC endpoint within the same VPC and enable private DNS, you shouldn't have to make much code change afterwards.\n", "\n", "We will test this below where we initialize the standard forecast client that points to the public DNS which resolves to the private DNS and resolves to the VPC endpoint." ] }, { "cell_type": "code", "execution_count": 30, "id": "e52e0d3c", "metadata": {}, "outputs": [], "source": [ "forecast = session.client(service_name='forecast')" ] }, { "cell_type": "code", "execution_count": 31, "id": "7de9cb07", "metadata": {}, "outputs": [ { "ename": "ClientError", "evalue": "An error occurred (AccessDeniedException) when calling the ListPredictors operation: User: arn:aws:sts::282494629315:assumed-role/AmazonSageMaker-ExecutionRole-20220127T153361/SageMaker is not authorized to perform: forecast:ListPredictors because no VPC endpoint policy allows the forecast:ListPredictors action", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mClientError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mforecast\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlist_predictors\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_api_call\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 389\u001b[0m \"%s() only accepts keyword arguments.\" % py_operation_name)\n\u001b[1;32m 390\u001b[0m \u001b[0;31m# The \"self\" in this scope is referring to the BaseClient.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 391\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_api_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moperation_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 392\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 393\u001b[0m \u001b[0m_api_call\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpy_operation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_make_api_call\u001b[0;34m(self, operation_name, api_params)\u001b[0m\n\u001b[1;32m 717\u001b[0m \u001b[0merror_code\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Error\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Code\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 718\u001b[0m \u001b[0merror_class\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexceptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_code\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merror_code\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 719\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0merror_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed_response\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 720\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 721\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mClientError\u001b[0m: An error occurred (AccessDeniedException) when calling the ListPredictors operation: User: arn:aws:sts::282494629315:assumed-role/AmazonSageMaker-ExecutionRole-20220127T153361/SageMaker is not authorized to perform: forecast:ListPredictors because no VPC endpoint policy allows the forecast:ListPredictors action" ] } ], "source": [ "forecast.list_predictors()" ] }, { "cell_type": "code", "execution_count": 32, "id": "d19cd3be", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Datasets': [],\n", " 'ResponseMetadata': {'RequestId': 'aa280a6a-3a45-42ef-9cad-18bc03ac3e1a',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'date': 'Fri, 28 Jan 2022 22:43:32 GMT',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '15',\n", " 'connection': 'keep-alive',\n", " 'x-amzn-requestid': 'aa280a6a-3a45-42ef-9cad-18bc03ac3e1a'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.list_datasets()" ] }, { "cell_type": "markdown", "id": "36241f76", "metadata": {}, "source": [ "#### Restoring full access policy\n", "\n", "From above, we can see that the VPCE policy is still being applied, so we learned that we are now making connections through the VPC endpoint.\n", "\n", "VPC endpoint policy can be useful since it's applied in conjunction with the calling identity's IAM permissions. For example, if you want to restrict all calls to only `ListDatasets` for all calls through the VPC endpoint, you can do so via the VPC endpoint policy instead of having to restrict each user's IAM permission. Of course, the best practice is to restrict user's IAM permission down to only what's needed though.\n", "\n", "Since we have seen how VPC endpoint policies work, let's restore the policy back to full access now for the rest of the notebook. Note that you can customize the VPC endpoint policy, and we have some example policies listed at https://docs.aws.amazon.com/forecast/latest/dg/vpc-interface-endpoints.html\n", "\n", "1. Go to VPC console page\n", "1. Click `Endpoints` on the left navigation panel\n", "1. Select the VPC endpoint we created for `com.amazonaws.us-west-2.forecast`\n", "1. Click on the `Policy` tab\n", "1. Click `Edit Policy`\n", "1. Select `Full access`\n", "1. Click `Save`\n", "1. It takes a few minutes for the policy to update. But you should be able to execute any Forecast APIs afterwards" ] }, { "cell_type": "code", "execution_count": 34, "id": "0e1fc5aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Predictors': [],\n", " 'ResponseMetadata': {'RequestId': '44aa99c1-c765-4a96-adfc-b487b6da578b',\n", " 'HTTPStatusCode': 200,\n", " 'HTTPHeaders': {'date': 'Fri, 28 Jan 2022 22:49:59 GMT',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'content-length': '17',\n", " 'connection': 'keep-alive',\n", " 'x-amzn-requestid': '44aa99c1-c765-4a96-adfc-b487b6da578b'},\n", " 'RetryAttempts': 0}}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecast.list_predictors()" ] }, { "cell_type": "markdown", "id": "8b3cf7b5", "metadata": {}, "source": [ "## Conclusion\n", "\n", "Through this guide, we learned how to create VPC endpoints through Amazon Forecast's integration with VPC PrivateLink, as well as how to connect to Amazon Forecast services using these VPC endpoints via [Notebook](https://github.com/aws-samples/amazon-forecast-samples/tree/main/notebooks/advanced/VPC_PrivateLink/Connect_Via_VPC_Endpoint_Guide.ipynb). We also learned what VPC endpoint policies are, and how to configure VPC endpoint policies to verify that API calls are going through the VPC endpoints. Through VPC endpoints, you can now connect to Amazon Forecast services through these endpoints without having to connect through public internet.\n", "\n", "To learn more, review [Forecast and Interface VPC endpoints](https://docs.aws.amazon.com/forecast/latest/dg/vpc-interface-endpoints.html). All these new capabilities are available in all Regions where Forecast is publicly available. For more information about Region availability, see [Amazon Forecast endpoints and quotas](https://alpha-docs-aws.amazon.com/general/latest/gr/forecast.html#forecast_region)\n" ] }, { "cell_type": "markdown", "id": "8c15730d", "metadata": {}, "source": [ "## Bonus: Walking through Forecast Creation \n", "\n", "For the rest of this notebook, we will be just revisiting the Forecast creation steps within the Forecast Quick Start Guide, just to show that the API calls are successful even inside VPC.\n", "\n", "However, if you want more explanations, please look at https://github.com/aws-samples/amazon-forecast-samples/blob/main/notebooks/basic/Getting_Started/Amazon_Forecast_Quick_Start_Guide.ipynb" ] }, { "cell_type": "markdown", "id": "31e98cc5", "metadata": {}, "source": [ "#### Creating the Dataset" ] }, { "cell_type": "code", "execution_count": 35, "id": "1b7c227a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Dataset with ARN arn:aws:forecast:us-west-2:282494629315:dataset/TAXI_TS_PRIVATE_LINK is now ACTIVE.\n" ] } ], "source": [ "DATASET_FREQUENCY = \"H\" # H for hourly.\n", "TS_DATASET_NAME = \"TAXI_TS_PRIVATE_LINK\"\n", "TS_SCHEMA = {\n", " \"Attributes\":[\n", " {\n", " \"AttributeName\":\"timestamp\",\n", " \"AttributeType\":\"timestamp\"\n", " },\n", " {\n", " \"AttributeName\":\"item_id\",\n", " \"AttributeType\":\"string\"\n", " },\n", " {\n", " \"AttributeName\":\"target_value\",\n", " \"AttributeType\":\"integer\"\n", " }\n", " ]\n", "}\n", "\n", "create_dataset_response = forecast.create_dataset(Domain=\"CUSTOM\",\n", " DatasetType='TARGET_TIME_SERIES',\n", " DatasetName=TS_DATASET_NAME,\n", " DataFrequency=DATASET_FREQUENCY,\n", " Schema=TS_SCHEMA)\n", "\n", "ts_dataset_arn = create_dataset_response['DatasetArn']\n", "describe_dataset_response = forecast.describe_dataset(DatasetArn=ts_dataset_arn)\n", "\n", "print(f\"The Dataset with ARN {ts_dataset_arn} is now {describe_dataset_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "4e3ad83f", "metadata": {}, "source": [ "#### Importing the Dataset" ] }, { "cell_type": "code", "execution_count": 39, "id": "14ba20dc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiting for Dataset Import Job with ARN arn:aws:forecast:us-west-2:282494629315:dataset-import-job/TAXI_TS_PRIVATE_LINK/TAXI_TTS_IMPORT_PRIVATE_LINK to become ACTIVE. This process could take 5-10 minutes.\n", "\n", "Current Status:\n", "CREATE_PENDING .\n", "CREATE_IN_PROGRESS ....................................\n", "ACTIVE \n", "\n", "\n", "The Dataset Import Job with ARN arn:aws:forecast:us-west-2:282494629315:dataset-import-job/TAXI_TS_PRIVATE_LINK/TAXI_TTS_IMPORT_PRIVATE_LINK is now ACTIVE.\n" ] } ], "source": [ "TIMESTAMP_FORMAT = \"yyyy-MM-dd hh:mm:ss\"\n", "TS_IMPORT_JOB_NAME = \"TAXI_TTS_IMPORT_PRIVATE_LINK\"\n", "TIMEZONE = \"EST\"\n", "\n", "ts_dataset_import_job_response = \\\n", " forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,\n", " DatasetArn=ts_dataset_arn,\n", " DataSource= {\n", " \"S3Config\" : {\n", " \"Path\": ts_s3_path,\n", " \"RoleArn\": role_arn\n", " } \n", " },\n", " TimestampFormat=TIMESTAMP_FORMAT,\n", " TimeZone = TIMEZONE)\n", "\n", "ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']\n", "describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)\n", "\n", "print(f\"Waiting for Dataset Import Job with ARN {ts_dataset_import_job_arn} to become ACTIVE. This process could take 5-10 minutes.\\n\\nCurrent Status:\")\n", "\n", "status = util.wait(lambda: forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn))\n", "\n", "describe_dataset_import_job_response = forecast.describe_dataset_import_job(DatasetImportJobArn=ts_dataset_import_job_arn)\n", "print(f\"\\n\\nThe Dataset Import Job with ARN {ts_dataset_import_job_arn} is now {describe_dataset_import_job_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "25f5675e", "metadata": {}, "source": [ "#### Creating a DatasetGroup" ] }, { "cell_type": "code", "execution_count": 40, "id": "7c25da5a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The DatasetGroup with ARN arn:aws:forecast:us-west-2:282494629315:dataset-group/TAXI_DEMO_PRIVATE_LINK is now ACTIVE.\n" ] } ], "source": [ "DATASET_GROUP_NAME = \"TAXI_DEMO_PRIVATE_LINK\"\n", "DATASET_ARNS = [ts_dataset_arn]\n", "\n", "create_dataset_group_response = \\\n", " forecast.create_dataset_group(Domain=\"CUSTOM\",\n", " DatasetGroupName=DATASET_GROUP_NAME,\n", " DatasetArns=DATASET_ARNS)\n", "\n", "dataset_group_arn = create_dataset_group_response['DatasetGroupArn']\n", "describe_dataset_group_response = forecast.describe_dataset_group(DatasetGroupArn=dataset_group_arn)\n", "\n", "print(f\"The DatasetGroup with ARN {dataset_group_arn} is now {describe_dataset_group_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "1f0b00d3", "metadata": {}, "source": [ "#### Train a predictor" ] }, { "cell_type": "code", "execution_count": null, "id": "63adaf5f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiting for Predictor with ARN arn:aws:forecast:us-west-2:282494629315:predictor/TAXI_PREDICTOR_01FTHF1KK7E6VVR676WVN24MXC to become ACTIVE. Depending on data size and predictor setting,it can take several hours to be ACTIVE.\n", "\n", "Current Status:\n", "CREATE_PENDING ..\n} ], "source": [ "PREDICTOR_NAME = \"TAXI_PREDICTOR\"\n", "FORECAST_HORIZON = 24\n", "FORECAST_FREQUENCY = \"H\"\n", "HOLIDAY_DATASET = [{\n", " 'Name': 'holiday',\n", " 'Configuration': {\n", " 'CountryCode': ['US']\n", " }\n", "}]\n", "\n", "create_auto_predictor_response = \\\n", " forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,\n", " ForecastHorizon = FORECAST_HORIZON,\n", " ForecastFrequency = FORECAST_FREQUENCY,\n", " DataConfig = {\n", " 'DatasetGroupArn': dataset_group_arn, \n", " 'AdditionalDatasets': HOLIDAY_DATASET\n", " },\n", " ExplainPredictor = True)\n", "\n", "predictor_arn = create_auto_predictor_response['PredictorArn']\n", "print(f\"Waiting for Predictor with ARN {predictor_arn} to become ACTIVE. Depending on data size and predictor setting,it can take several hours to be ACTIVE.\\n\\nCurrent Status:\")\n", "\n", "status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))\n", "\n", "describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)\n", "print(f\"\\n\\nThe Predictor with ARN {predictor_arn} is now {describe_auto_predictor_response['Status']}.\")" ] }, { "cell_type": "code", "execution_count": 46, "id": "74d3ced6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ACTIVE \n", "\n", "\n", "The Predictor with ARN arn:aws:forecast:us-west-2:282494629315:predictor/TAXI_PREDICTOR_01FTHF1KK7E6VVR676WVN24MXC is now ACTIVE.\n" ] } ], "source": [ "status = util.wait(lambda: forecast.describe_auto_predictor(PredictorArn=predictor_arn))\n", "\n", "describe_auto_predictor_response = forecast.describe_auto_predictor(PredictorArn=predictor_arn)\n", "print(f\"\\n\\nThe Predictor with ARN {predictor_arn} is now {describe_auto_predictor_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "b22c42b7", "metadata": {}, "source": [ "#### Generate forecasts" ] }, { "cell_type": "code", "execution_count": 50, "id": "7e56e8f6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Waiting for Forecast with ARN arn:aws:forecast:us-west-2:282494629315:forecast/TAXI_FORECAST_PRIVATE_LINK to become ACTIVE. Depending on data size and predictor settings,it can take several hours to be ACTIVE.\n", "\n", "Current Status:\n", "CREATE_PENDING \n", "CREATE_IN_PROGRESS ..............................................................................................\n", "ACTIVE \n", "\n", "\n", "The Forecast with ARN arn:aws:forecast:us-west-2:282494629315:forecast/TAXI_FORECAST_PRIVATE_LINK is now ACTIVE.\n" ] } ], "source": [ "FORECAST_NAME = \"TAXI_FORECAST_PRIVATE_LINK\"\n", "\n", "create_forecast_response = \\\n", " forecast.create_forecast(ForecastName=FORECAST_NAME,\n", " PredictorArn=predictor_arn)\n", "\n", "forecast_arn = create_forecast_response['ForecastArn']\n", "print(f\"Waiting for Forecast with ARN {forecast_arn} to become ACTIVE. Depending on data size and predictor settings,it can take several hours to be ACTIVE.\\n\\nCurrent Status:\")\n", "\n", "status = util.wait(lambda: forecast.describe_forecast(ForecastArn=forecast_arn))\n", "\n", "describe_forecast_response = forecast.describe_forecast(ForecastArn=forecast_arn)\n", "print(f\"\\n\\nThe Forecast with ARN {forecast_arn} is now {describe_forecast_response['Status']}.\")" ] }, { "cell_type": "markdown", "id": "4f0c1d18", "metadata": {}, "source": [ "#### Load ground truth for pick-up location 48 on February 1, 2019." ] }, { "cell_type": "code", "execution_count": 51, "id": "a7a5ef99", "metadata": {}, "outputs": [], "source": [ "ITEM_ID = \"48\"\n", "\n", "taxi_feb_df = pd.read_csv(\"data/taxi-feb2019.csv\", dtype = object, names=['timestamp','item_id','target_value'])\n", "taxi_feb_df.target_value = taxi_feb_df.target_value.astype(float)\n", "\n", "actuals = taxi_feb_df[(taxi_feb_df['item_id'] == ITEM_ID)]" ] }, { "cell_type": "markdown", "id": "12782c51", "metadata": {}, "source": [ "#### Query forecasts for pick-up location 48 on February 1, 2019." ] }, { "cell_type": "code", "execution_count": 56, "id": "2ce31334", "metadata": {}, "outputs": [], "source": [ "# Creating client that has privateDNS enabled, so still going through VPC\n", "forecastquery = session.client(service_name='forecastquery')" ] }, { "cell_type": "code", "execution_count": 57, "id": "170a455c", "metadata": {}, "outputs": [], "source": [ "forecast_response = forecastquery.query_forecast(\n", " ForecastArn=forecast_arn,\n", " Filters={\"item_id\": ITEM_ID}\n", ")" ] }, { "cell_type": "code", "execution_count": 58, "id": "6565b7e4", "metadata": {}, "outputs": [], "source": [ "\n", "forecasts_p10_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p10'])\n", "forecasts_p50_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p50'])\n", "forecasts_p90_df = pd.DataFrame.from_dict(forecast_response['Forecast']['Predictions']['p90'])" ] }, { "cell_type": "markdown", "id": "08bdbe1d", "metadata": {}, "source": [ "#### Compare the forecasts with ground truth" ] }, { "cell_type": "code", "execution_count": 60, "id": "be6fea71", "metadata": {}, "outputs": [], "source": [ "import dateutil" ] }, { "cell_type": "code", "execution_count": 61, "id": "1da56f4b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "results_df = pd.DataFrame(columns=['timestamp', 'value', 'source'])\n", "\n", "for index, row in actuals.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['target_value'], 'source': 'actual'} , ignore_index=True)\n", "for index, row in forecasts_p10_df.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['Timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p10'} , ignore_index=True)\n", "for index, row in forecasts_p50_df.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['Timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p50'} , ignore_index=True)\n", "for index, row in forecasts_p90_df.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['Timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p90'} , ignore_index=True)\n", "\n", "pivot_df = results_df.pivot(columns='source', values='value', index=\"timestamp\")\n", "\n", "pivot_df.plot(figsize=(15, 7))" ] }, { "cell_type": "markdown", "id": "b1c13929", "metadata": {}, "source": [ "* **Impact scores** measure the relative impact attributes have on forecast values. For example, if the holiday attribute has an impact score that is twice as large as another possible attribute, say weather, you can conclude that the holiday has twice the impact on forecast values than the weather. \n", "* **Impact scores** also provide information on whether an attribute increases or decreases the forecasted value. A negative impact scores reflects that the attribute tends to decrease the value of the forecast." ] }, { "cell_type": "markdown", "id": "5ab8e605", "metadata": {}, "source": [ "## Clean-up \n", "Uncomment the code section to delete all resources that were created in this notebook." ] }, { "cell_type": "code", "execution_count": 62, "id": "d8437459", "metadata": {}, "outputs": [], "source": [ "# forecast.delete_resource_tree(ResourceArn = dataset_group_arn)\n", "# forecast.delete_resource_tree(ResourceArn = ts_dataset_arn)" ] }, { "cell_type": "markdown", "id": "de1f0f35", "metadata": {}, "source": [ "Be sure to also manually delete the following resources that we created\n", "\n", "- VPC endpoints\n", "- Notebook instance\n", "- Notebook instance role\n", "- VPC\n", "- Subnet\n", "- Security Group\n" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }