{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7de4471f",
   "metadata": {},
   "source": [
    "# Prepare inputs for Amazon Quicksight visualization\n",
    "\n",
    "Amazon QuickSight is a cloud-scale business intelligence (BI) service that you can use to deliver easy-to-understand insights to the people who you work with, wherever they are. Amazon QuickSight connects to your data in the cloud and combines data from many different sources. In a single data dashboard, QuickSight can include AWS data, third-party data, big data, spreadsheet data, SaaS data, B2B data, and more. As a fully managed cloud-based service, Amazon QuickSight provides enterprise-grade security, global availability, and built-in redundancy. It also provides the user-management tools that you need to scale from 10 users to 10,000, all with no infrastructure to deploy or manage.\n",
    "\n",
    "In this notebook, we will prepare the manifest file that we need to use with Amazon Quicksight to visualize insights we generated from our customer call transcripts."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "146615fd",
   "metadata": {},
   "source": [
    "### Initialize libraries and import variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2797e21",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import libraries\n",
    "import pandas as pd\n",
    "import boto3\n",
    "import json\n",
    "import csv\n",
    "import os\n",
    "\n",
    "# initialize variables we need\n",
    "infile = 'quicksight_raw_manifest.json'\n",
    "outfile = 'quicksight_formatted_manifest_type.json'\n",
    "\n",
    "inprefix = 'quicksight/data'\n",
    "manifestprefix = 'quicksight/manifest'\n",
    "\n",
    "bucket = '<your-bucket-name>' # Enter your bucket name here\n",
    "\n",
    "s3 = boto3.client('s3')\n",
    "\n",
    "try:\n",
    "    s3.head_bucket(Bucket=bucket)\n",
    "except:\n",
    "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc71d262",
   "metadata": {},
   "source": [
    "### Review transcripts with insights for QuickSight\n",
    "When we ran the previous notebooks, we created CSV files containing speaker and time segmentation, the inference results that classified the transcripts to CTA/No CTA using Amazon Comprehend custom classification, we detected custom entities using Amazon Comprehend custom entity recognizer, and we finally detected the sentiment of the call transcripts using Amazon Comprehend Sentiment anlysis feature. These are available in our temp folder, let us move these to the quicksight/input folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57d8c711",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets review what CSV files we have for QuickSight\n",
    "!aws s3 ls s3://{bucket}/{inprefix} --recursive "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfe4137c",
   "metadata": {},
   "source": [
    "### Update QuickSight Manifest\n",
    "We will replace the S3 bucket and prefix from the raw manifest file with what you have entered in STEP 0 - CELL 1 above. We will then create a new formatted manifest file that will be used for creating a dataset with Amazon QuickSight based on the content we extract from the handwritten documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75619a75",
   "metadata": {},
   "outputs": [],
   "source": [
    "# S3 boto3 client handle\n",
    "s3 = boto3.client('s3')\n",
    "\n",
    "# Create formatted manifests for each type of dataset we need from the raw manifest JSON\n",
    "types = ['transcripts', 'entity', 'cta', 'sentiment']\n",
    "\n",
    "manifest = open(infile, 'r')\n",
    "ln = json.load(manifest)\n",
    "t = json.dumps(ln['fileLocations'][0]['URIPrefixes'])\n",
    "for type in types:\n",
    "    t1 = t.replace('bucket', bucket).replace('prefix', inprefix + '/' + type)\n",
    "    ln['fileLocations'][0]['URIPrefixes'] = json.loads(t1)\n",
    "    outfile_rep = outfile.replace('type', type)\n",
    "    with open(outfile_rep, 'w', encoding='utf-8') as out:\n",
    "        json.dump(ln, out, ensure_ascii=False, indent=4)\n",
    "    # Upload the manifest to S3\n",
    "    s3.upload_file(outfile_rep, bucket, manifestprefix + '/' + outfile_rep)\n",
    "    print(\"Manifest file uploaded to: s3://{}/{}\".format(bucket, manifestprefix + '/' + outfile_rep))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a6dce9f",
   "metadata": {},
   "source": [
    "#### Please copy the manifest S3 URIs above. We need it when we build the datasets for the QuickSight dashboard.\n",
    "\n",
    "### We are done here. Please go back to workshop instructions."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}