{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f6a9b2c3",
   "metadata": {},
   "source": [
    "# Deploying an Amazon Comprehend Model with SageMaker Pipelines\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9c046eb",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "412d858a",
   "metadata": {},
   "source": [
    "\n",
    "This example notebook showcases how you can deploy a custom text classification model using Amazon Comprehend and SageMaker Pipelines.\n",
    "\n",
    "Before you start make sure that your SageMaker Execution Role has the following policies:\n",
    "\n",
    "- `ComprehendFullAccess`\n",
    "- `AmazonSageMakerFullAccess`\n",
    "- `AWSLambda_FullAccess`\n",
    "- `IAMFullAccess`\n",
    "\n",
    "Your SageMaker Execution Role should have access to S3 already. If not you can add the S3 full access policy.\n",
    "You will also need to add `iam:passRole` as an inline policy."
   ]
  },
  {
   "cell_type": "raw",
   "id": "b18b38b7",
   "metadata": {},
   "source": [
    "{\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Action\": [\n",
    "                \"iam:PassRole\"|\n",
    "            ],\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Resource\": \"*\"\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ba5c63c",
   "metadata": {},
   "source": [
    "Finally, you will need the following trust policies."
   ]
  },
  {
   "cell_type": "raw",
   "id": "2a5e0ed0",
   "metadata": {},
   "source": [
    "{\n",
    "  \"Version\": \"2012-10-17\",\n",
    "  \"Statement\": [\n",
    "    {\n",
    "      \"Effect\": \"Allow\",\n",
    "      \"Principal\": {\n",
    "        \"Service\": [\n",
    "          \"sagemaker.amazonaws.com\",\n",
    "          \"s3.amazonaws.com\",\n",
    "          \"comprehend.amazonaws.com\",\n",
    "          \"lambda.amazonaws.com\"\n",
    "        ]\n",
    "      },\n",
    "      \"Action\": \"sts:AssumeRole\"\n",
    "    }\n",
    "  ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc0d7ab1",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "First, we are going to import the SageMaker SDK and set some default variables such as the `role` for permissioned execution and the `default_bucket` to store model artifacts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f8ca227",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%pip install s3path\n",
    "%pip install s3fs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8ddc2fd",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2deea8ee",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "from sagemaker.sklearn.processing import SKLearnProcessor\n",
    "from sagemaker.workflow.lambda_step import (\n",
    "    LambdaStep,\n",
    "    LambdaOutput,\n",
    "    LambdaOutputTypeEnum,\n",
    ")\n",
    "from sagemaker.lambda_helper import Lambda\n",
    "from sagemaker.processing import ProcessingInput, ProcessingOutput\n",
    "from sagemaker.workflow.steps import ProcessingStep\n",
    "from sagemaker.workflow.properties import PropertyFile\n",
    "from sagemaker.workflow.parameters import ParameterInteger, ParameterString\n",
    "\n",
    "region = boto3.Session().region_name\n",
    "sagemaker_session = sagemaker.session.Session()\n",
    "role_arn = sagemaker.get_execution_role()\n",
    "default_bucket = sagemaker_session.default_bucket()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "850b90b3",
   "metadata": {},
   "source": [
    "## Dataset\n",
    "\n",
    "Let's inspect the train and test dataset we will be using in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b753bf7c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "trainFrame = pd.read_csv(\n",
    "    \"s3://aws-ml-blog/artifacts/comprehend-custom-classification/comprehend-train.csv\",\n",
    "    header=None,\n",
    ")\n",
    "trainFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1a3b873",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "testFrame = pd.read_csv(\n",
    "    \"s3://aws-ml-blog/artifacts/comprehend-custom-classification/comprehend-test.csv\",\n",
    "    header=None,\n",
    ")\n",
    "testFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60a3e50d",
   "metadata": {},
   "source": [
    "Direct S3 to S3 copy does not work if you are in a different region. Hence, we will copy the data from source to local, then local to target."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2e29869",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!aws s3 cp s3://aws-ml-blog/artifacts/comprehend-custom-classification/comprehend-train.csv .\n",
    "!aws s3 cp s3://aws-ml-blog/artifacts/comprehend-custom-classification/comprehend-test.csv .\n",
    "\n",
    "!aws s3 cp ./comprehend-train.csv s3://$default_bucket/\n",
    "!aws s3 cp ./comprehend-test.csv s3://$default_bucket/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce42aa57",
   "metadata": {},
   "source": [
    "Next, we define parameters that can be set for the execution of the pipeline. They serve as variables. We define the following:\n",
    "\n",
    "- `TrainData`: Location of the training data in S3\n",
    "- `TestData`: Location of the test data in S3\n",
    "- `RoleArn`: ARN (Amazon Resource Name) of the role used for pipeline execution\n",
    "- `ModelOutput`: Location of the target S3 path for the Amazon Comprehend model artifact\n",
    "\n",
    "Amazon Comprehend creates its own validation set when training, so there is no need to provide one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdfde4f3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "processing_instance_count = ParameterInteger(name=\"ProcessingInstanceCount\", default_value=1)\n",
    "\n",
    "input_train = ParameterString(\n",
    "    name=\"TrainData\",\n",
    "    default_value=f\"s3://{default_bucket}/comprehend-train.csv\",\n",
    ")\n",
    "\n",
    "input_test = ParameterString(\n",
    "    name=\"TestData\",\n",
    "    default_value=f\"s3://{default_bucket}/comprehend-test.csv\",\n",
    ")\n",
    "\n",
    "model_output = ParameterString(name=\"ModelOutput\", default_value=f\"s3://{default_bucket}/model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "465ca7d0",
   "metadata": {},
   "source": [
    "We use [SKLearnProcessor](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor) to run Python scripts to train, and deploy Amazon Comprehend models using `boto3`. In the next chunk, we instantiate an instance of `SKLearnProcessor` that we use in the next steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e368bde",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "sklearn_processor = SKLearnProcessor(\n",
    "    framework_version=\"1.0-1\",\n",
    "    instance_type=\"ml.m5.xlarge\",\n",
    "    instance_count=processing_instance_count,\n",
    "    base_job_name=\"comprehend-process\",\n",
    "    sagemaker_session=sagemaker_session,\n",
    "    role=role_arn,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "005a430a",
   "metadata": {},
   "source": [
    "The first Amazon SageMaker [ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html?highlight=ProcessingStep#sagemaker.workflow.steps.ProcessingStep) provides a containerized execution environment to run the `prepare_data.py` script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0182a8ea",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "preprocess = ProcessingStep(\n",
    "    name=\"ComprehendProcess\",\n",
    "    processor=sklearn_processor,\n",
    "    inputs=[\n",
    "        ProcessingInput(source=input_train, destination=\"/opt/ml/processing/input_train\"),\n",
    "        ProcessingInput(source=input_test, destination=\"/opt/ml/processing/input_test\"),\n",
    "    ],\n",
    "    outputs=[\n",
    "        ProcessingOutput(output_name=\"train\", source=\"/opt/ml/processing/train\"),\n",
    "        ProcessingOutput(output_name=\"test\", source=\"/opt/ml/processing/test\"),\n",
    "    ],\n",
    "    code=\"prepare_data.py\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec27749e",
   "metadata": {},
   "source": [
    "The second Amazon SageMaker processing step trains the Amazon Comprehend model by running `train_eval_comprehend.py`. Amazon Comprehend automatically evaluates the performance on an evaluation set. We will use that score as a condition for deploying the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "131c433d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "evaluation_report = PropertyFile(\n",
    "    name=\"ComprehendEvaluationReport\",\n",
    "    output_name=\"evaluation\",\n",
    "    path=\"evaluation.json\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29d7202f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "comprehend_train_and_eval = ProcessingStep(\n",
    "    name=\"ComprehendTrainAndEval\",\n",
    "    processor=sklearn_processor,\n",
    "    job_arguments=[\n",
    "        \"--train-input-file\",\n",
    "        preprocess.properties.ProcessingOutputConfig.Outputs[\"train\"].S3Output.S3Uri,\n",
    "        \"--train-output-path\",\n",
    "        model_output,\n",
    "        \"--iam-role-arn\",\n",
    "        role_arn,\n",
    "    ],\n",
    "    code=\"train_eval_comprehend.py\",\n",
    "    outputs=[\n",
    "        ProcessingOutput(output_name=\"evaluation\", source=\"/opt/ml/processing/evaluation\"),\n",
    "        ProcessingOutput(output_name=\"arn\", source=\"/opt/ml/processing/arn\"),\n",
    "    ],\n",
    "    property_files=[evaluation_report],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e009d0e",
   "metadata": {},
   "source": [
    "The third Amazon SageMaker processing step deploys the Amazon Comprehend model running `deploy_comprehend.py`. If the Accuracy reported after training is lower than a certain threshold, this step does not run and the pipeline stops here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6b72073",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "step_deploy_model = ProcessingStep(\n",
    "    name=\"ComprehendDeploy\",\n",
    "    processor=sklearn_processor,\n",
    "    job_arguments=[\n",
    "        \"--arn-path\",\n",
    "        comprehend_train_and_eval.properties.ProcessingOutputConfig.Outputs[\"arn\"].S3Output.S3Uri,\n",
    "    ],\n",
    "    code=\"deploy_comprehend.py\",\n",
    "    outputs=[\n",
    "        ProcessingOutput(output_name=\"endpoint_arn\", source=\"/opt/ml/processing/endpoint_arn\")\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85b5ee46",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "comprehend_train_and_eval.properties.ProcessingOutputConfig.Outputs[\"arn\"].S3Output.S3Uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c129977a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo\n",
    "from sagemaker.workflow.condition_step import ConditionStep\n",
    "from sagemaker.workflow.functions import JsonGet\n",
    "\n",
    "cond_lte = ConditionGreaterThanOrEqualTo(\n",
    "    left=JsonGet(\n",
    "        step_name=\"ComprehendTrainAndEval\",\n",
    "        property_file=evaluation_report,\n",
    "        json_path=\"Accuracy\",\n",
    "    ),\n",
    "    right=0.65,\n",
    ")\n",
    "\n",
    "step_cond = ConditionStep(\n",
    "    name=\"ComprehendAccuracyCondition\",\n",
    "    conditions=[cond_lte],\n",
    "    if_steps=[step_deploy_model],\n",
    "    else_steps=[],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8fc091b",
   "metadata": {},
   "source": [
    "Finally, the deployed model can be used for inference. At this stage we use [AWS Lambda](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.lambda_step.LambdaStep) to call the Amazon Comprehend endpoint with the text of our choice.\n",
    "\n",
    "A role is needed to create the Lambda function. We will use a helper function in `iam_helper.py` from the [Lambda Step](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-pipelines/tabular/lambda-step/iam_helper.py) example to create the role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d00d29",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import iam_helper\n",
    "\n",
    "lambda_role_name = \"DEMO-test-comprehend-lambda-role\"\n",
    "lambda_role = iam_helper.create_lambda_role(lambda_role_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "398f9556",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "example_text = (\n",
    "    \"Italian EBU Member RAI has won the 65th Eurovision Song Contest with the song \"\n",
    "    + \"Zitti e buoni performed by Måneskin. It's the 3rd win for Italy who last triumphed in 1990. \"\n",
    "    + \"26 countries took part in the Grand Final of the world’s largest live music event, \"\n",
    "    + \"hosted by Dutch EBU Members NPO, NOS and AVROTROS on Saturday 22 May in Rotterdam. \"\n",
    "    + \"Måneskin wrote the winning song which finished the night with 524 points, 25 points \"\n",
    "    + \"ahead of 2nd placed France represented by Barbara Pravi singing Voila. Switzerland’s Gjon’s Tears with Tout l’Univers finished in third place.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "461cb554",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Custom Lambda Step\n",
    "function_name = \"DEMO-sagemaker-lambda-step-endpoint-test\"\n",
    "\n",
    "# Lambda helper class can be used to create the Lambda function\n",
    "endpoint_lambda = Lambda(\n",
    "    function_name=function_name,\n",
    "    execution_role_arn=lambda_role,\n",
    "    script=\"test_comprehend_lambda.py\",\n",
    "    handler=\"test_comprehend_lambda.lambda_handler\",\n",
    ")\n",
    "\n",
    "test_endpoint = LambdaStep(\n",
    "    name=\"LambdaStep\",\n",
    "    lambda_func=endpoint_lambda,\n",
    "    inputs={\n",
    "        \"endpoint_arn_path\": step_deploy_model.properties.ProcessingOutputConfig.Outputs[\n",
    "            \"endpoint_arn\"\n",
    "        ].S3Output.S3Uri,\n",
    "        \"text\": example_text,\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "544abf72",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "step_deploy_model.properties.ProcessingOutputConfig.Outputs[\"endpoint_arn\"].S3Output.S3Uri"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21d4966c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker.workflow.pipeline import Pipeline\n",
    "\n",
    "pipeline_name = \"DEMO-ComprehendPipeline\"\n",
    "pipeline = Pipeline(\n",
    "    name=pipeline_name,\n",
    "    parameters=[\n",
    "        processing_instance_count,\n",
    "        input_train,\n",
    "        input_test,\n",
    "        model_output,\n",
    "    ],\n",
    "    steps=[preprocess, comprehend_train_and_eval, step_cond, test_endpoint],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23af2887",
   "metadata": {},
   "source": [
    "Once the pipeline is successfully defined, we can start the execution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a6f3ddd",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipeline.upsert(role_arn=role_arn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bf5d09f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "execution = pipeline.start()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3672f4e7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "execution.wait(delay=300, max_attempts=25)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1959cc1f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "execution.list_steps()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c3f7a16",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "775ad3b4",
   "metadata": {},
   "source": [
    "In this notebook we have seen how to create a SageMaker Pipeline to train an Amazon Comprehend Custom Classifier on your own dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ce11aba",
   "metadata": {},
   "source": [
    "## Clean up\n",
    "\n",
    "Note: the Comprehend Classifier endpoint will incur charges, but the Lambda function and IAM role will not incur costs if not deleted (if the Lambda function isn't invoked)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2138fb99",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import re\n",
    "import json\n",
    "from s3path import S3Path\n",
    "\n",
    "\n",
    "def get_comprehend_endpoint_arn(execution):\n",
    "    sagemaker_client = boto3.client(\"sagemaker\")\n",
    "    step_response = [\n",
    "        step for step in execution.list_steps() if step[\"StepName\"] == \"ComprehendDeploy\"\n",
    "    ][0]\n",
    "    deploy_job_arn = step_response[\"Metadata\"][\"ProcessingJob\"][\"Arn\"]\n",
    "    deploy_job_name = re.split(\":|/\", deploy_job_arn)[-1]\n",
    "    job_response = sagemaker_client.describe_processing_job(ProcessingJobName=deploy_job_name)\n",
    "    job_path = (\n",
    "        job_response[\"ProcessingOutputConfig\"][\"Outputs\"][0][\"S3Output\"][\"S3Uri\"]\n",
    "        + \"/endpoint_arn.txt\"\n",
    "    )\n",
    "    with S3Path.from_uri(job_path).open() as f:\n",
    "        comprehend_arn = f.read()\n",
    "\n",
    "    return comprehend_arn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e586a45c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Delete Comprehend endpoint\n",
    "comprehend_client = boto3.client(\"comprehend\")\n",
    "comprehend_client.delete_endpoint(EndpointArn=get_comprehend_endpoint_arn(execution))\n",
    "\n",
    "# Delete Lambda function\n",
    "lambda_client = boto3.client(\"lambda\")\n",
    "lambda_client.delete_function(FunctionName=function_name)\n",
    "\n",
    "# Delete IAM role\n",
    "iam_helper.delete_lambda_role(lambda_role_name)\n",
    "\n",
    "# Delete SageMaker Pipeline\n",
    "pipeline.delete()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d00cb5ca",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-pipelines|nlp|amazon_comprehend_sagemaker_pipeline|sm_pipeline_with_comprehend.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}