{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon Comprehend Sentiment Analysis \n",
    "\n",
    "Amazon Comprehend can be used to perform sentiment analysis. You can accurately analyze customer interactions, including social media posts, reviews, customer interaction transcripts to improve your products and services.\n",
    "\n",
    "You can use Amazon Comprehend to determine the sentiment of a document. You can determine if the sentiment is positive, negative, neutral, or mixed. For example, you can use sentiment analysis to determine the sentiments of comments on a blog posting to determine if your readers liked the post.\n",
    "\n",
    "Determine sentiment operations can be performed using any of the primary languages supported by Amazon Comprehend. All documents must be in the same language.\n",
    "\n",
    "You can use any of the following operations to detect the sentiment of a document or a set of documents.\n",
    "\n",
    "    DetectSentiment\n",
    "\n",
    "    BatchDetectSentiment\n",
    "\n",
    "    StartSentimentDetectionJob\n",
    "\n",
    "The operations return the most likely sentiment for the text as well as the scores for each of the sentiments. The score represents the likelihood that the sentiment was correctly detected. For example, in the example below it is 95 percent likely that the text has a Positive sentiment. There is a less than 1 percent likelihood that the text has a Negative sentiment. You can use the SentimentScore to determine if the accuracy of the detection meets the needs of your application.\n",
    "\n",
    "The DetectSentiment operation returns an object that contains the detected sentiment and a SentimentScore object. The BatchDetectSentiment operation returns a list of sentiments and SentimentScore objects, one for each document in the batch. The StartSentimentDetectionJob operation starts an asynchronous job that produces a file containing a list of sentiments and SentimentScore objects, one for each document in the job. \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This lab includes step-by-step instructions for performing sentiment analysis using Amazon Comprehend."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Let's start by specifying:\n",
    "\n",
    "* AWS region.\n",
    "* The IAM role arn used to give access to Comprehend API and S3 bucket.\n",
    "* The S3 bucket that you want to use for training and model data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import os\n",
    "import boto3\n",
    "import re\n",
    "import json\n",
    "import sagemaker\n",
    "from sagemaker import get_execution_role\n",
    "\n",
    "region = boto3.Session().region_name\n",
    "\n",
    "role = get_execution_role()\n",
    "\n",
    "bucket = sagemaker.Session().default_bucket()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "s3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis\n"
     ]
    }
   ],
   "source": [
    "prefix = \"sagemaker/sentiment-analysis\"\n",
    "bucketuri=\"s3://\"+bucket+\"/\"+prefix\n",
    "print(bucketuri)\n",
    "# customize to your bucket where you have stored the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "Let's start by uploading the dataset the sample data s3 bucket.The  sample dataset contains Amazon reviews taken from the larger dataset \"Amazon reviews - Full\", which was published with the article \"Character-level Convolutional Networks for Text Classification\" (Xiang Zhang et al., 2015). \n",
    "\n",
    "Now lets read this into a Pandas data frame and take a look.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "# Download the data set\n",
    "\n",
    "!wget https://docs.aws.amazon.com/comprehend/latest/dg/samples/tutorial-reviews-data.zip\n",
    "!apt-get install unzip -y\n",
    "!unzip -o tutorial-reviews-data.zip\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Written in old English. It was very hard to re...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Thought I was getting a book received book on ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>this book was recommend 2 me from my neighbor....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>I believe that this is a fantastic book for th...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>I always liked this book. The kindle version i...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>a classic book where there is really nothing b...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>I had to read this book, which I had heard qui...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>This book was purchased for my daughter for a ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>A classic? Hardly. In it's time this book may ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>The whole time reading The Scarlet Letter I th...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>36 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               Review\n",
       "0   Written in old English. It was very hard to re...\n",
       "1   Thought I was getting a book received book on ...\n",
       "2   this book was recommend 2 me from my neighbor....\n",
       "3   I believe that this is a fantastic book for th...\n",
       "4   I always liked this book. The kindle version i...\n",
       "..                                                ...\n",
       "31  a classic book where there is really nothing b...\n",
       "32  I had to read this book, which I had heard qui...\n",
       "33  This book was purchased for my daughter for a ...\n",
       "34  A classic? Hardly. In it's time this book may ...\n",
       "35  The whole time reading The Scarlet Letter I th...\n",
       "\n",
       "[36 rows x 1 columns]"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np                                # For matrix operations and numerical processing\n",
    "import pandas as pd \n",
    "\n",
    "# data = pd.read_csv('./amazon-reviews.csv')   \n",
    "data = pd.read_csv('./amazon-reviews.csv', header=None, names=['Review'])\n",
    "pd.set_option('display.max_rows', 20)# Keep the output on one page\n",
    "\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use detect_sentiment API for real time usecase\n",
    "\n",
    "First, we will be using detect_sentiment API. The DetectSentiment operation returns an object that contains the detected sentiment and a SentimentScore object.\n",
    "\n",
    "Lets check a plain text example to begin. \n",
    "\n",
    "Steps:\n",
    "* Use boto3 to initialize the comprehend client\n",
    "* Define the sample text \n",
    "* Called the detect_sentiment API and pass in the text as the input parameter. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calling DetectSentiment\n",
      "{\n",
      "    \"ResponseMetadata\": {\n",
      "        \"HTTPHeaders\": {\n",
      "            \"content-length\": \"162\",\n",
      "            \"content-type\": \"application/x-amz-json-1.1\",\n",
      "            \"date\": \"Wed, 30 Jun 2021 21:33:58 GMT\",\n",
      "            \"x-amzn-requestid\": \"f35cec59-11f9-4b87-85b6-5c29579ad11e\"\n",
      "        },\n",
      "        \"HTTPStatusCode\": 200,\n",
      "        \"RequestId\": \"f35cec59-11f9-4b87-85b6-5c29579ad11e\",\n",
      "        \"RetryAttempts\": 0\n",
      "    },\n",
      "    \"Sentiment\": \"NEUTRAL\",\n",
      "    \"SentimentScore\": {\n",
      "        \"Mixed\": 9.628861880628392e-05,\n",
      "        \"Negative\": 0.30989840626716614,\n",
      "        \"Neutral\": 0.6552183032035828,\n",
      "        \"Positive\": 0.03478698432445526\n",
      "    }\n",
      "}\n",
      "End of DetectSentiment\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import boto3\n",
    "import json\n",
    "\n",
    "comprehend = boto3.client(service_name='comprehend', region_name=region)\n",
    "                \n",
    "text = \"It is raining today in Seattle\"\n",
    "\n",
    "print('Calling DetectSentiment')\n",
    "print(json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4))\n",
    "print('End of DetectSentiment\\n')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets use the detect_sentiment API for our sample dataset and check the response. \n",
    "\n",
    "Note: We are just testing with 5 reviews and we will check the output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Written in old English. It was very hard to read as I had to think through most sentences to figure out what was being said.\n",
      "\n",
      "\n",
      "{\n",
      "    \"ResponseMetadata\": {\n",
      "        \"HTTPHeaders\": {\n",
      "            \"content-length\": \"163\",\n",
      "            \"content-type\": \"application/x-amz-json-1.1\",\n",
      "            \"date\": \"Wed, 30 Jun 2021 21:36:24 GMT\",\n",
      "            \"x-amzn-requestid\": \"9255dbe4-2206-4074-9c37-17246fffe2b1\"\n",
      "        },\n",
      "        \"HTTPStatusCode\": 200,\n",
      "        \"RequestId\": \"9255dbe4-2206-4074-9c37-17246fffe2b1\",\n",
      "        \"RetryAttempts\": 0\n",
      "    },\n",
      "    \"Sentiment\": \"NEGATIVE\",\n",
      "    \"SentimentScore\": {\n",
      "        \"Mixed\": 0.0036034041550010443,\n",
      "        \"Negative\": 0.97439044713974,\n",
      "        \"Neutral\": 0.01862311363220215,\n",
      "        \"Positive\": 0.003383016213774681\n",
      "    }\n",
      "}\n",
      "Thought I was getting a book received book on tape. a little deceiving. Rather than that the process went as expected. If it had been what we wanted would have been great.\n",
      "\n",
      "\n",
      "{\n",
      "    \"ResponseMetadata\": {\n",
      "        \"HTTPHeaders\": {\n",
      "            \"content-length\": \"166\",\n",
      "            \"content-type\": \"application/x-amz-json-1.1\",\n",
      "            \"date\": \"Wed, 30 Jun 2021 21:36:24 GMT\",\n",
      "            \"x-amzn-requestid\": \"fdff8985-b862-485d-8dbb-8e1e64f03047\"\n",
      "        },\n",
      "        \"HTTPStatusCode\": 200,\n",
      "        \"RequestId\": \"fdff8985-b862-485d-8dbb-8e1e64f03047\",\n",
      "        \"RetryAttempts\": 0\n",
      "    },\n",
      "    \"Sentiment\": \"NEGATIVE\",\n",
      "    \"SentimentScore\": {\n",
      "        \"Mixed\": 0.029959602281451225,\n",
      "        \"Negative\": 0.9654799103736877,\n",
      "        \"Neutral\": 0.002631053328514099,\n",
      "        \"Positive\": 0.0019294769736006856\n",
      "    }\n",
      "}\n",
      "this book was recommend 2 me from my neighbor. but i dont think that dis is dat gud of a book 2 recommend... i didnt like it 2 much but...\n",
      "\n",
      "\n",
      "{\n",
      "    \"ResponseMetadata\": {\n",
      "        \"HTTPHeaders\": {\n",
      "            \"content-length\": \"160\",\n",
      "            \"content-type\": \"application/x-amz-json-1.1\",\n",
      "            \"date\": \"Wed, 30 Jun 2021 21:36:24 GMT\",\n",
      "            \"x-amzn-requestid\": \"9503a21e-949d-4d41-88d2-404ef3488ac0\"\n",
      "        },\n",
      "        \"HTTPStatusCode\": 200,\n",
      "        \"RequestId\": \"9503a21e-949d-4d41-88d2-404ef3488ac0\",\n",
      "        \"RetryAttempts\": 0\n",
      "    },\n",
      "    \"Sentiment\": \"MIXED\",\n",
      "    \"SentimentScore\": {\n",
      "        \"Mixed\": 0.8287831544876099,\n",
      "        \"Negative\": 0.1558845043182373,\n",
      "        \"Neutral\": 0.005947569385170937,\n",
      "        \"Positive\": 0.009384780190885067\n",
      "    }\n",
      "}\n",
      "I believe that this is a fantastic book for those who have more distinguished tastes. As for others, it is a slow read with challenging vocabulary.\n",
      "\n",
      "\n",
      "{\n",
      "    \"ResponseMetadata\": {\n",
      "        \"HTTPHeaders\": {\n",
      "            \"content-length\": \"162\",\n",
      "            \"content-type\": \"application/x-amz-json-1.1\",\n",
      "            \"date\": \"Wed, 30 Jun 2021 21:36:25 GMT\",\n",
      "            \"x-amzn-requestid\": \"d6ced7cd-94bf-42b7-8546-b6c997271a8d\"\n",
      "        },\n",
      "        \"HTTPStatusCode\": 200,\n",
      "        \"RequestId\": \"d6ced7cd-94bf-42b7-8546-b6c997271a8d\",\n",
      "        \"RetryAttempts\": 0\n",
      "    },\n",
      "    \"Sentiment\": \"MIXED\",\n",
      "    \"SentimentScore\": {\n",
      "        \"Mixed\": 0.9686610102653503,\n",
      "        \"Negative\": 0.0013004766078665853,\n",
      "        \"Neutral\": 5.386784323491156e-05,\n",
      "        \"Positive\": 0.02998470328748226\n",
      "    }\n",
      "}\n",
      "I always liked this book. The kindle version is kind of difficult to navigate but it does its purpose. I can still go back to my last read without any problem.\n",
      "\n",
      "\n",
      "{\n",
      "    \"ResponseMetadata\": {\n",
      "        \"HTTPHeaders\": {\n",
      "            \"content-length\": \"160\",\n",
      "            \"content-type\": \"application/x-amz-json-1.1\",\n",
      "            \"date\": \"Wed, 30 Jun 2021 21:36:25 GMT\",\n",
      "            \"x-amzn-requestid\": \"ebeb0d70-ccb0-44d6-8060-23e88a04c1a2\"\n",
      "        },\n",
      "        \"HTTPStatusCode\": 200,\n",
      "        \"RequestId\": \"ebeb0d70-ccb0-44d6-8060-23e88a04c1a2\",\n",
      "        \"RetryAttempts\": 0\n",
      "    },\n",
      "    \"Sentiment\": \"MIXED\",\n",
      "    \"SentimentScore\": {\n",
      "        \"Mixed\": 0.6079463362693787,\n",
      "        \"Negative\": 0.000349800189724192,\n",
      "        \"Neutral\": 0.00025608116993680596,\n",
      "        \"Positive\": 0.3914477825164795\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "for index, row in data.iloc[:5].iterrows():\n",
    "    print(row[0])\n",
    "    print(\"\\n\")\n",
    "    print(json.dumps(comprehend.detect_sentiment(Text=row[0], LanguageCode='en'), sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use batch_detect_sentiment API\n",
    "To send batches of up to 25 documents, you can use the Amazon Comprehend batch operations. Calling a batch operation is identical to calling the single document APIs for each document in the request. Using the batch APIs can result in better performance for your applications. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [],
   "source": [
    "#We will prepare a list of the 25 review document so we can use it for batch function\n",
    "rows,columns=data.shape\n",
    "\n",
    "list_text=[] #your empty list \n",
    "for index in range(25): #iteration over the dataframe\n",
    "    list_text.append(data.iat[index,0])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'ResultList': [{'Index': 0, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.003383016213774681, 'Negative': 0.97439044713974, 'Neutral': 0.01862311363220215, 'Mixed': 0.0036034041550010443}}, {'Index': 1, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.0019294769736006856, 'Negative': 0.9654799103736877, 'Neutral': 0.002631053328514099, 'Mixed': 0.029959602281451225}}, {'Index': 2, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.009384780190885067, 'Negative': 0.1558845043182373, 'Neutral': 0.005947569385170937, 'Mixed': 0.8287831544876099}}, {'Index': 3, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.02998470328748226, 'Negative': 0.0013004766078665853, 'Neutral': 5.386784323491156e-05, 'Mixed': 0.9686610102653503}}, {'Index': 4, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.3914477825164795, 'Negative': 0.000349800189724192, 'Neutral': 0.00025608116993680596, 'Mixed': 0.6079463362693787}}, {'Index': 5, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.9966793060302734, 'Negative': 6.251714512472972e-05, 'Neutral': 0.0031647798605263233, 'Mixed': 9.345235594082624e-05}}, {'Index': 6, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.01471242867410183, 'Negative': 0.03303435817360878, 'Neutral': 0.0002424443664494902, 'Mixed': 0.9520108103752136}}, {'Index': 7, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.043585892766714096, 'Negative': 0.03871111571788788, 'Neutral': 0.01708519086241722, 'Mixed': 0.9006178975105286}}, {'Index': 8, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.6812999248504639, 'Negative': 0.08030632138252258, 'Neutral': 0.16914251446723938, 'Mixed': 0.06925120949745178}}, {'Index': 9, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.0015970569802448153, 'Negative': 0.9977276921272278, 'Neutral': 0.0006307305884547532, 'Mixed': 4.449755942914635e-05}}, {'Index': 10, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.0018778092926368117, 'Negative': 0.9972975850105286, 'Neutral': 0.0001090758596546948, 'Mixed': 0.0007154956110753119}}, {'Index': 11, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.886422872543335, 'Negative': 0.031057370826601982, 'Neutral': 0.005157501436769962, 'Mixed': 0.07736223191022873}}, {'Index': 12, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.9735071659088135, 'Negative': 0.00029977119993418455, 'Neutral': 0.00028464687056839466, 'Mixed': 0.025908485054969788}}, {'Index': 13, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.050407588481903076, 'Negative': 0.005543750710785389, 'Neutral': 0.00034211232559755445, 'Mixed': 0.9437066316604614}}, {'Index': 14, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.7373078465461731, 'Negative': 0.21913199126720428, 'Neutral': 0.02875964343547821, 'Mixed': 0.014800486154854298}}, {'Index': 15, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.0016888657119125128, 'Negative': 0.5170453190803528, 'Neutral': 0.0005565205356106162, 'Mixed': 0.48070940375328064}}, {'Index': 16, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.005014460068196058, 'Negative': 0.05469932779669762, 'Neutral': 6.350128387566656e-05, 'Mixed': 0.9402226209640503}}, {'Index': 17, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.999634861946106, 'Negative': 7.484462548745796e-05, 'Neutral': 0.00027003500144928694, 'Mixed': 2.029433017014526e-05}}, {'Index': 18, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.009423977695405483, 'Negative': 0.9631668329238892, 'Neutral': 0.01065142173320055, 'Mixed': 0.016757763922214508}}, {'Index': 19, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.000565775262657553, 'Negative': 0.934131383895874, 'Neutral': 0.00016520198551006615, 'Mixed': 0.06513765454292297}}, {'Index': 20, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.005492488853633404, 'Negative': 0.9910051822662354, 'Neutral': 0.0007788850925862789, 'Mixed': 0.0027233934961259365}}, {'Index': 21, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.2739538550376892, 'Negative': 0.0035695910919457674, 'Neutral': 0.0006848344928584993, 'Mixed': 0.7217917442321777}}, {'Index': 22, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.06102084368467331, 'Negative': 0.828016996383667, 'Neutral': 0.061267297714948654, 'Mixed': 0.04969481751322746}}, {'Index': 23, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.6989343166351318, 'Negative': 0.2831159234046936, 'Neutral': 0.002125322353094816, 'Mixed': 0.01582450233399868}}, {'Index': 24, 'Sentiment': 'POSITIVE', 'SentimentScore': {'Positive': 0.7257474064826965, 'Negative': 0.05449317768216133, 'Neutral': 0.0316728800535202, 'Mixed': 0.18808649480342865}}], 'ErrorList': [], 'ResponseMetadata': {'RequestId': 'e19a9e60-af05-4f9b-b6c0-74d5d4612915', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'e19a9e60-af05-4f9b-b6c0-74d5d4612915', 'content-type': 'application/x-amz-json-1.1', 'content-length': '4398', 'date': 'Wed, 30 Jun 2021 21:36:32 GMT'}, 'RetryAttempts': 0}}\n"
     ]
    }
   ],
   "source": [
    "response = comprehend.batch_detect_sentiment(\n",
    "    TextList=list_text,\n",
    "    LanguageCode='en'\n",
    ")\n",
    "\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Asynchronous Batch Processing using StartSentimentDetectionJob\n",
    "\n",
    "To analyze large documents and large collections of documents, use one of the Amazon Comprehend asynchronous operations. There is an asynchronous version of each of the Amazon Comprehend operations and an additional set of operations for topic modeling.\n",
    "\n",
    "To analyze a collection of documents, you typically perform the following steps:\n",
    "\n",
    "   * Store the documents in an Amazon S3 bucket.\n",
    "\n",
    "   * Start one or more jobs to analyze the documents.\n",
    "\n",
    "   * Monitor the progress of an analysis job.\n",
    "\n",
    "   * Retrieve the results of the analysis from an S3 bucket when the job is complete.\n",
    "\n",
    "The following sections describe using the Amazon Comprehend API to run asynchronous operations. \n",
    "\n",
    "We would be using the following API:\n",
    "\n",
    "StartSentimentDetectionJob — Start a job to detect the emotional sentiment in each document in the collection. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "s3 = boto3.resource('s3')\n",
    "\n",
    "\n",
    "s3.Bucket(bucket).upload_file(\"amazon-reviews.csv\", \"sagemaker/sentiment-analysis/amazon-reviews.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "import uuid\n",
    "job_uuid = uuid.uuid1()\n",
    "job_name = f\"sentimentanalysis-job-{job_uuid}\"\n",
    "inputs3uri= bucketuri+\"/amazon-reviews.csv\"\n",
    "asyncresponse = comprehend.start_sentiment_detection_job(\n",
    "    InputDataConfig={\n",
    "        'S3Uri': inputs3uri,\n",
    "        'InputFormat': 'ONE_DOC_PER_LINE'\n",
    "    },\n",
    "    OutputDataConfig={\n",
    "        'S3Uri': bucketuri,\n",
    "       \n",
    "    },\n",
    "    DataAccessRoleArn=role,\n",
    "    JobName=job_name,\n",
    "    LanguageCode='en',\n",
    " \n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'SentimentDetectionJobProperties': {'JobId': '0ffd0ab3417d566ab6bb4373e3652de9', 'JobName': 'sentimentanalysis-job-6136c5e4-d9ec-11eb-9099-9fc2b3b35db3', 'JobStatus': 'IN_PROGRESS', 'SubmitTime': datetime.datetime(2021, 6, 30, 21, 44, 43, 196000, tzinfo=tzlocal()), 'InputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/amazon-reviews.csv', 'InputFormat': 'ONE_DOC_PER_LINE'}, 'OutputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/340280328827-SENTIMENT-0ffd0ab3417d566ab6bb4373e3652de9/output/output.tar.gz'}, 'LanguageCode': 'en', 'DataAccessRoleArn': 'arn:aws:iam::340280328827:role/SagemakerFullAccessPolicy'}, 'ResponseMetadata': {'RequestId': 'b3872d1d-0751-40c2-9f33-a6f94c193d1d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'b3872d1d-0751-40c2-9f33-a6f94c193d1d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '740', 'date': 'Wed, 30 Jun 2021 21:44:45 GMT'}, 'RetryAttempts': 0}}\n"
     ]
    }
   ],
   "source": [
    "events_job_id = asyncresponse['JobId']\n",
    "job = comprehend.describe_sentiment_detection_job(JobId=events_job_id)\n",
    "print(job)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'SentimentDetectionJobProperties': {'JobId': '0ffd0ab3417d566ab6bb4373e3652de9', 'JobName': 'sentimentanalysis-job-6136c5e4-d9ec-11eb-9099-9fc2b3b35db3', 'JobStatus': 'IN_PROGRESS', 'SubmitTime': datetime.datetime(2021, 6, 30, 21, 44, 43, 196000, tzinfo=tzlocal()), 'InputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/amazon-reviews.csv', 'InputFormat': 'ONE_DOC_PER_LINE'}, 'OutputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/340280328827-SENTIMENT-0ffd0ab3417d566ab6bb4373e3652de9/output/output.tar.gz'}, 'LanguageCode': 'en', 'DataAccessRoleArn': 'arn:aws:iam::340280328827:role/SagemakerFullAccessPolicy'}, 'ResponseMetadata': {'RequestId': '9fe25947-5ed0-42db-b7a1-ba4e4df214ce', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '9fe25947-5ed0-42db-b7a1-ba4e4df214ce', 'content-type': 'application/x-amz-json-1.1', 'content-length': '740', 'date': 'Wed, 30 Jun 2021 21:44:55 GMT'}, 'RetryAttempts': 0}}\n"
     ]
    }
   ],
   "source": [
    "from time import sleep\n",
    "# Get current job status\n",
    "job = comprehend.describe_sentiment_detection_job(JobId=events_job_id)\n",
    "print(job)\n",
    "# Loop until job is completed\n",
    "waited = 0\n",
    "timeout_minutes = 10\n",
    "while job['SentimentDetectionJobProperties']['JobStatus'] != 'COMPLETED':\n",
    "    sleep(60)\n",
    "    waited += 60\n",
    "    assert waited//60 < timeout_minutes, \"Job timed out after %d seconds.\" % waited\n",
    "    job = comprehend.describe_sentiment_detection_job(JobId=events_job_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The job would take roughly 6-8 minutes to complete and you can download the output from the output location you specified in the job paramters. You can open Comprehend in your console and check the job details there as well. Asynchronous method would be very useful when you have multiple documents and you want to run asynchronous batch.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}