# Amazon Comprehend Sentiment Analysis 

Amazon Comprehend can be used to perform sentiment analysis. You can accurately analyze customer interactions, including social media posts, reviews, customer interaction transcripts to improve your products and services.

You can use Amazon Comprehend to determine the sentiment of a document. You can determine if the sentiment is positive, negative, neutral, or mixed. For example, you can use sentiment analysis to determine the sentiments of comments on a blog posting to determine if your readers liked the post.

Determine sentiment operations can be performed using any of the primary languages supported by Amazon Comprehend. All documents must be in the same language.

You can use any of the following operations to detect the sentiment of a document or a set of documents.

 DetectSentiment

 BatchDetectSentiment

 StartSentimentDetectionJob

The operations return the most likely sentiment for the text as well as the scores for each of the sentiments. The score represents the likelihood that the sentiment was correctly detected. For example, in the example below it is 95 percent likely that the text has a Positive sentiment. There is a less than 1 percent likelihood that the text has a Negative sentiment. You can use the SentimentScore to determine if the accuracy of the detection meets the needs of your application.

The DetectSentiment operation returns an object that contains the detected sentiment and a SentimentScore object. The BatchDetectSentiment operation returns a list of sentiments and SentimentScore objects, one for each document in the batch. The StartSentimentDetectionJob operation starts an asynchronous job that produces a file containing a list of sentiments and SentimentScore objects, one for each document in the job. 



This lab includes step-by-step instructions for performing sentiment analysis using Amazon Comprehend.

## Setup

Let's start by specifying:

* AWS region.
* The IAM role arn used to give access to Comprehend API and S3 bucket.
* The S3 bucket that you want to use for training and model data.


In [79]:

import os
import boto3
import re
import json
import sagemaker
from sagemaker import get_execution_role

region = boto3.Session().region_name

role = get_execution_role()

bucket = sagemaker.Session().default_bucket()

In [80]:
prefix = "sagemaker/sentiment-analysis"
bucketuri="s3://"+bucket+"/"+prefix
print(bucketuri)
# customize to your bucket where you have stored the data

s3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis


## Data
Let's start by uploading the dataset the sample data s3 bucket.The sample dataset contains Amazon reviews taken from the larger dataset "Amazon reviews - Full", which was published with the article "Character-level Convolutional Networks for Text Classification" (Xiang Zhang et al., 2015). 

Now lets read this into a Pandas data frame and take a look.


In [None]:
# Download the data set

!wget https://docs.aws.amazon.com/comprehend/latest/dg/samples/tutorial-reviews-data.zip
!apt-get install unzip -y
!unzip -o tutorial-reviews-data.zip



In [81]:
import numpy as np # For matrix operations and numerical processing
import pandas as pd 

# data = pd.read_csv('./amazon-reviews.csv') 
data = pd.read_csv('./amazon-reviews.csv', header=None, names=['Review'])
pd.set_option('display.max_rows', 20)# Keep the output on one page

data

Unnamed: 0,Review
0,Written in old English. It was very hard to re...
1,Thought I was getting a book received book on ...
2,this book was recommend 2 me from my neighbor....
3,I believe that this is a fantastic book for th...
4,I always liked this book. The kindle version i...
...,...
31,a classic book where there is really nothing b...
32,"I had to read this book, which I had heard qui..."
33,This book was purchased for my daughter for a ...
34,A classic? Hardly. In it's time this book may ...


## Use detect_sentiment API for real time usecase

First, we will be using detect_sentiment API. The DetectSentiment operation returns an object that contains the detected sentiment and a SentimentScore object.

Lets check a plain text example to begin. 

Steps:
* Use boto3 to initialize the comprehend client
* Define the sample text 
* Called the detect_sentiment API and pass in the text as the input parameter. 

In [82]:
import boto3
import json

comprehend = boto3.client(service_name='comprehend', region_name=region)
 
text = "It is raining today in Seattle"

print('Calling DetectSentiment')
print(json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4))
print('End of DetectSentiment\n')


Calling DetectSentiment
{
 "ResponseMetadata": {
 "HTTPHeaders": {
 "content-length": "162",
 "content-type": "application/x-amz-json-1.1",
 "date": "Wed, 30 Jun 2021 21:33:58 GMT",
 "x-amzn-requestid": "f35cec59-11f9-4b87-85b6-5c29579ad11e"
 },
 "HTTPStatusCode": 200,
 "RequestId": "f35cec59-11f9-4b87-85b6-5c29579ad11e",
 "RetryAttempts": 0
 },
 "Sentiment": "NEUTRAL",
 "SentimentScore": {
 "Mixed": 9.628861880628392e-05,
 "Negative": 0.30989840626716614,
 "Neutral": 0.6552183032035828,
 "Positive": 0.03478698432445526
 }
}
End of DetectSentiment



Now lets use the detect_sentiment API for our sample dataset and check the response. 

Note: We are just testing with 5 reviews and we will check the output

In [83]:
for index, row in data.iloc[:5].iterrows():
 print(row[0])
 print("\n")
 print(json.dumps(comprehend.detect_sentiment(Text=row[0], LanguageCode='en'), sort_keys=True, indent=4))

Written in old English. It was very hard to read as I had to think through most sentences to figure out what was being said.


{
 "ResponseMetadata": {
 "HTTPHeaders": {
 "content-length": "163",
 "content-type": "application/x-amz-json-1.1",
 "date": "Wed, 30 Jun 2021 21:36:24 GMT",
 "x-amzn-requestid": "9255dbe4-2206-4074-9c37-17246fffe2b1"
 },
 "HTTPStatusCode": 200,
 "RequestId": "9255dbe4-2206-4074-9c37-17246fffe2b1",
 "RetryAttempts": 0
 },
 "Sentiment": "NEGATIVE",
 "SentimentScore": {
 "Mixed": 0.0036034041550010443,
 "Negative": 0.97439044713974,
 "Neutral": 0.01862311363220215,
 "Positive": 0.003383016213774681
 }
}
Thought I was getting a book received book on tape. a little deceiving. Rather than that the process went as expected. If it had been what we wanted would have been great.


{
 "ResponseMetadata": {
 "HTTPHeaders": {
 "content-length": "166",
 "content-type": "application/x-amz-json-1.1",
 "date": "Wed, 30 Jun 2021 21:36:24 GMT",
 "x-amzn-requestid": "fdff8985-b86

## Use batch_detect_sentiment API
To send batches of up to 25 documents, you can use the Amazon Comprehend batch operations. Calling a batch operation is identical to calling the single document APIs for each document in the request. Using the batch APIs can result in better performance for your applications. 

In [84]:
#We will prepare a list of the 25 review document so we can use it for batch function
rows,columns=data.shape

list_text=[] #your empty list 
for index in range(25): #iteration over the dataframe
 list_text.append(data.iat[index,0])


In [85]:
response = comprehend.batch_detect_sentiment(
 TextList=list_text,
 LanguageCode='en'
)

print(response)

{'ResultList': [{'Index': 0, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.003383016213774681, 'Negative': 0.97439044713974, 'Neutral': 0.01862311363220215, 'Mixed': 0.0036034041550010443}}, {'Index': 1, 'Sentiment': 'NEGATIVE', 'SentimentScore': {'Positive': 0.0019294769736006856, 'Negative': 0.9654799103736877, 'Neutral': 0.002631053328514099, 'Mixed': 0.029959602281451225}}, {'Index': 2, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.009384780190885067, 'Negative': 0.1558845043182373, 'Neutral': 0.005947569385170937, 'Mixed': 0.8287831544876099}}, {'Index': 3, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.02998470328748226, 'Negative': 0.0013004766078665853, 'Neutral': 5.386784323491156e-05, 'Mixed': 0.9686610102653503}}, {'Index': 4, 'Sentiment': 'MIXED', 'SentimentScore': {'Positive': 0.3914477825164795, 'Negative': 0.000349800189724192, 'Neutral': 0.00025608116993680596, 'Mixed': 0.6079463362693787}}, {'Index': 5, 'Sentiment': 'POSITIVE', 'SentimentSc

## Asynchronous Batch Processing using StartSentimentDetectionJob

To analyze large documents and large collections of documents, use one of the Amazon Comprehend asynchronous operations. There is an asynchronous version of each of the Amazon Comprehend operations and an additional set of operations for topic modeling.

To analyze a collection of documents, you typically perform the following steps:

 * Store the documents in an Amazon S3 bucket.

 * Start one or more jobs to analyze the documents.

 * Monitor the progress of an analysis job.

 * Retrieve the results of the analysis from an S3 bucket when the job is complete.

The following sections describe using the Amazon Comprehend API to run asynchronous operations. 

We would be using the following API:

StartSentimentDetectionJob — Start a job to detect the emotional sentiment in each document in the collection. 

In [86]:

s3 = boto3.resource('s3')


s3.Bucket(bucket).upload_file("amazon-reviews.csv", "sagemaker/sentiment-analysis/amazon-reviews.csv")

In [91]:
import uuid
job_uuid = uuid.uuid1()
job_name = f"sentimentanalysis-job-{job_uuid}"
inputs3uri= bucketuri+"/amazon-reviews.csv"
asyncresponse = comprehend.start_sentiment_detection_job(
 InputDataConfig={
 'S3Uri': inputs3uri,
 'InputFormat': 'ONE_DOC_PER_LINE'
 },
 OutputDataConfig={
 'S3Uri': bucketuri,
 
 },
 DataAccessRoleArn=role,
 JobName=job_name,
 LanguageCode='en',
 
)

In [92]:
events_job_id = asyncresponse['JobId']
job = comprehend.describe_sentiment_detection_job(JobId=events_job_id)
print(job)

{'SentimentDetectionJobProperties': {'JobId': '0ffd0ab3417d566ab6bb4373e3652de9', 'JobName': 'sentimentanalysis-job-6136c5e4-d9ec-11eb-9099-9fc2b3b35db3', 'JobStatus': 'IN_PROGRESS', 'SubmitTime': datetime.datetime(2021, 6, 30, 21, 44, 43, 196000, tzinfo=tzlocal()), 'InputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/amazon-reviews.csv', 'InputFormat': 'ONE_DOC_PER_LINE'}, 'OutputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/340280328827-SENTIMENT-0ffd0ab3417d566ab6bb4373e3652de9/output/output.tar.gz'}, 'LanguageCode': 'en', 'DataAccessRoleArn': 'arn:aws:iam::340280328827:role/SagemakerFullAccessPolicy'}, 'ResponseMetadata': {'RequestId': 'b3872d1d-0751-40c2-9f33-a6f94c193d1d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'b3872d1d-0751-40c2-9f33-a6f94c193d1d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '740', 'date': 'Wed, 30 Jun 2021 21:44:45 GMT'}, 'RetryAttem

In [None]:
from time import sleep
# Get current job status
job = comprehend.describe_sentiment_detection_job(JobId=events_job_id)
print(job)
# Loop until job is completed
waited = 0
timeout_minutes = 10
while job['SentimentDetectionJobProperties']['JobStatus'] != 'COMPLETED':
 sleep(60)
 waited += 60
 assert waited//60 < timeout_minutes, "Job timed out after %d seconds." % waited
 job = comprehend.describe_sentiment_detection_job(JobId=events_job_id)

{'SentimentDetectionJobProperties': {'JobId': '0ffd0ab3417d566ab6bb4373e3652de9', 'JobName': 'sentimentanalysis-job-6136c5e4-d9ec-11eb-9099-9fc2b3b35db3', 'JobStatus': 'IN_PROGRESS', 'SubmitTime': datetime.datetime(2021, 6, 30, 21, 44, 43, 196000, tzinfo=tzlocal()), 'InputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/amazon-reviews.csv', 'InputFormat': 'ONE_DOC_PER_LINE'}, 'OutputDataConfig': {'S3Uri': 's3://sagemaker-us-east-1-340280328827/sagemaker/sentiment-analysis/340280328827-SENTIMENT-0ffd0ab3417d566ab6bb4373e3652de9/output/output.tar.gz'}, 'LanguageCode': 'en', 'DataAccessRoleArn': 'arn:aws:iam::340280328827:role/SagemakerFullAccessPolicy'}, 'ResponseMetadata': {'RequestId': '9fe25947-5ed0-42db-b7a1-ba4e4df214ce', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '9fe25947-5ed0-42db-b7a1-ba4e4df214ce', 'content-type': 'application/x-amz-json-1.1', 'content-length': '740', 'date': 'Wed, 30 Jun 2021 21:44:55 GMT'}, 'RetryAttem

The job would take roughly 6-8 minutes to complete and you can download the output from the output location you specified in the job paramters. You can open Comprehend in your console and check the job details there as well. Asynchronous method would be very useful when you have multiple documents and you want to run asynchronous batch.

