{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Amazon Comprehend Demo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "Copyright [2017]-[2017] Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", "\n", "Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at\n", "\n", "http://aws.amazon.com/apache2.0/\n", "\n", "or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n", "***\n", "\n", "### Prerequisites:\n", "\n", "#### Identity and Acces Management\n", "\n", "The user or role that executes the commands must have permissions in AWS Identity and Access Management (IAM) to perform those actions. AWS provides a set of managed policies that help you get started quickly. For our example, you should apply the following managed policy to your user or role:\n", "\n", " ComprehendReadOnly\n", "\n", "Be aware that we recommend you follow AWS IAM best practices for production implementations, which is out of scope for this workshop." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import gzip\n", "import json\n", "import csv\n", "from pprint import pprint\n", "\n", "comprehend = boto3.client('comprehend', region_name='eu-west-1')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# download review dataset\n", "\n", "!curl -O http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Amazon_Instant_Video_5.json.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filename = 'reviews_Amazon_Instant_Video_5.json.gz'\n", "f = gzip.open(filename, 'r') \n", "out = [] \n", "x = 50 # only process the first 50 entries \n", "for line in f: \n", " x -= 1\n", " if x == 0:\n", " break\n", " review = json.loads(line)\n", " # get sentiment for reviewText\n", " reviewText = review['reviewText']\n", " if len(reviewText) > 5000: # only supporting up to 5000 Bytes, skipping entry\n", " print ('Skipping: %s' % reviewText)\n", " else:\n", " textSentiment = comprehend.detect_sentiment(\n", " Text=reviewText,\n", " LanguageCode='en'\n", " )\n", "\n", " out.append([review['reviewText'],review['asin'],textSentiment['Sentiment'],textSentiment['SentimentScore']['Positive'],textSentiment['SentimentScore']['Negative'],textSentiment['SentimentScore']['Neutral'],textSentiment['SentimentScore']['Mixed']]) \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pprint (out)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open('sentiment.csv', 'w') as csvfile:\n", " linewriter = csv.writer(csvfile, delimiter=';',quotechar='|', quoting=csv.QUOTE_MINIMAL)\n", " linewriter.writerow (['review','asin','Sentiment','Positive','Negative','Neutral','Mixed'])\n", " for all in out:\n", " linewriter.writerow(all)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }