{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "49212b4a",
   "metadata": {},
   "source": [
    "# Detect sentiment in customer calls using Amazon Comprehend\n",
    "\n",
    "Now we will detect the customer sentiment in the call conversations using Amazon Comprehend. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adf00292",
   "metadata": {},
   "source": [
    "### Import libraries and initialize variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a16de80a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import pandas as pd\n",
    "\n",
    "inprefix = 'comprehend/input'\n",
    "outprefix = 'quicksight/temp/insights'\n",
    "# Amazon Comprehend client\n",
    "comprehend = boto3.client('comprehend')\n",
    "# Amazon S3 clients\n",
    "s3 = boto3.client('s3')\n",
    "s3_resource = boto3.resource('s3')\n",
    "\n",
    "bucket = '<your-s3-bucket>' # Enter your bucket name here\n",
    "\n",
    "try:\n",
    "    s3.head_bucket(Bucket=bucket)\n",
    "except:\n",
    "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7e708e4",
   "metadata": {},
   "source": [
    "### Detect sentiment of transcripts\n",
    "For our workshop we will determine the sentiment of an entire call transcript to use with our visuals, but you can also capture sentiment trends in a conversation. We will demonstrate this during the workshop using the new **Transcribe Call Analytics** solution. If you like to try how this looks, please execute the optional code block at the end of this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "432afad2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare to page through our transcripts in S3\n",
    "paginator = s3.get_paginator('list_objects_v2')\n",
    "pages = paginator.paginate(Bucket=bucket, Prefix=inprefix)\n",
    "job_name_list = []\n",
    "t_prefix = 'quicksight/data/sentiment'\n",
    "\n",
    "# We will define a DataFrame to store the results of the sentiment analysis\n",
    "cols = ['transcript_name', 'sentiment']\n",
    "df_sent = pd.DataFrame(columns=cols)\n",
    "\n",
    "# Now lets page through the transcripts\n",
    "for page in pages:\n",
    "    for obj in page['Contents']:\n",
    "        # get the transcript file name\n",
    "        transcript_file_name = obj['Key'].split('/')[2]\n",
    "        # now lets get the transcript file contents\n",
    "        temp = s3_resource.Object(bucket, obj['Key'])\n",
    "        transcript_contents = temp.get()['Body'].read().decode('utf-8')\n",
    "        # Call Comprehend to detect sentiment\n",
    "        response = comprehend.detect_sentiment(Text=transcript_contents, LanguageCode='en')\n",
    "        # Update the results DataFrame with the cta predicted label\n",
    "        # Create a CSV file with cta label from this DataFrame\n",
    "        df_sent.loc[len(df_sent.index)] = [transcript_file_name.strip('en-').strip('.txt'),response['Sentiment']]\n",
    "        \n",
    "df_sent.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'sentiment.csv', index=False)\n",
    "df_sent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a785d9a",
   "metadata": {},
   "source": [
    "### OPTIONAL - Detect sentiment Trend\n",
    "We will now take one of the transcripts and show you how to detect sentiment trend in conversations. This can be a powerful insight to both demonstrate and understand the triggers for a shift in customer perspective as well as how to remedy it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3788eff",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select one of the transcripts we created in 1-Transcribe-Translate\n",
    "import os\n",
    "rootdir = '/home/ec2-user/SageMaker/aim317-uncover-insights-customer-conversations/notebooks/1-Transcribe-Translate-Calls'\n",
    "csvfile = ''\n",
    "for subdir, dirs, files in os.walk(rootdir):\n",
    "    for file in files:\n",
    "        filepath = subdir + os.sep + file\n",
    "        if filepath.endswith(\".csv\"):\n",
    "            csvfile = str(filepath)\n",
    "            break\n",
    "            \n",
    "df_t = pd.read_csv(csvfile)\n",
    "df_t.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87d7583d",
   "metadata": {},
   "source": [
    "Separate the sentences spoken by each of the speakers to their own dictionaries along with the last timestamp when their sentence ended"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7554b69",
   "metadata": {},
   "outputs": [],
   "source": [
    "spk_0 = {}\n",
    "spk_1 = {}\n",
    "a = ''\n",
    "b = ''\n",
    "j = 0\n",
    "k = 0\n",
    "for i, row in df_t.iterrows():\n",
    "    if row['speaker_label'] == 'spk_0':\n",
    "        if len(b) > 0:\n",
    "            j += 1\n",
    "            spk_1['end_time'+str(j)] = row['start_time'] \n",
    "            spk_1['transcript'+str(j)] = b\n",
    "            b = ''\n",
    "        a += row['content'] + ' '\n",
    "    if row['speaker_label'] == 'spk_1':\n",
    "        if len(a) > 0:\n",
    "            k += 1\n",
    "            spk_0['end_time'+str(k)] = row['start_time']\n",
    "            spk_0['transcript'+str(k)] = a\n",
    "            a = ''\n",
    "        b += row['content'] + ' '\n",
    "if len(a) > 0:\n",
    "    spk_0['transcript'+str(j+1)] = a\n",
    "    spk_0['end_time'+str(j+1)] = row['end_time']\n",
    "if len(b) > 0:\n",
    "    spk_1['transcript'+str(k+1)] = b\n",
    "    spk_1['end_time'+str(k+1)] = row['end_time']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e688186",
   "metadata": {},
   "source": [
    "#### Check the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07c49134",
   "metadata": {},
   "outputs": [],
   "source": [
    "spk_0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7addfcf",
   "metadata": {},
   "source": [
    "Now get the **sentiment for each line using Amazon Comprehend** and update the transcript with the sentiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4dbc59de",
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "for line in spk_0:\n",
    "    if 'transcript' in line:\n",
    "        res0 = comprehend.detect_sentiment(Text=spk_0[line], LanguageCode='en')['Sentiment']\n",
    "        spk_0[line] = res0\n",
    "\n",
    "for line in spk_1:\n",
    "    if 'transcript' in line:\n",
    "        res1 = comprehend.detect_sentiment(Text=spk_1[line], LanguageCode='en')['Sentiment']\n",
    "        spk_1[line] = res1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "440b8ad0",
   "metadata": {},
   "outputs": [],
   "source": [
    "spk_1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1e26acd",
   "metadata": {},
   "source": [
    "#### Let us now graph it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcde602e",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67257a58",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "spk_0_end_time = []\n",
    "spk_0_sentiment = []\n",
    "spk_1_end_time = []\n",
    "spk_1_sentiment = []\n",
    "\n",
    "\n",
    "for x in spk_0:\n",
    "    if 'end_time' in x:\n",
    "        spk_0_end_time.append(spk_0[x])\n",
    "    if 'transcript' in x:\n",
    "        spk_0_sentiment.append(spk_0[x])\n",
    "\n",
    "for x in spk_1:\n",
    "    if 'end_time' in x:\n",
    "        spk_1_end_time.append(spk_1[x])\n",
    "    if 'transcript' in x:\n",
    "        spk_1_sentiment.append(spk_1[x])\n",
    "        \n",
    "plt.plot(spk_0_end_time, spk_0_sentiment, color = 'g', label = 'Speaker 0 Sentiment Trend')\n",
    "plt.plot(spk_1_end_time, spk_1_sentiment, color = 'b', label = 'Speaker 1 Sentiment Trend')\n",
    "plt.xlabel('Call time in seconds')\n",
    "plt.ylabel('Sentiment')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d18282d2",
   "metadata": {},
   "source": [
    "As you can see above, the sky's the limit on what you can do with the Amazon Transcribe output in tandem with Amazon Comprehend. Please go back now to watch your team members create some **AWSome visuals using Amazon QuickSight!!**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e797961e",
   "metadata": {},
   "source": [
    "## End of notebook. Please go back to the workshop instructions to review the next steps."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}