{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intelligent Document Processing\n", "\n", "Documents contain valuable information and come in various shapes and forms. In most cases, you are manually processing these documents which is time consuming, prone to error, and costly. Not only do you want this information extracted quickly but can also automate business processes that presently relies on manual inputs and intervention across various file types and formats.\n", "\n", "To help you overcome these challenges, AWS Machine Learning (ML) now provides you choices when it comes to extracting information from complex content in any document format such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts.\n", "\n", "The diagram below shows an architecture for an Intelligent document processing workflow. It starts with data capture stage to securely store and aggregate different types (PDF, PNG, JPEG, and TIFF), formats, and layouts of documents. Followed by accurate classification of documents and extracting text and key insights from documents and perform further enrichments of the documents (such as identity entities, redaction etc.). Finally, the verification and review stage involves manual review of the documents for quality and accuracy, followed by consumption of the documents and extracted information into downstream databases/applications.\n", "\n", "In this workshop, we will explore the various aspects of this workflow such as the document classification, text and insights extraction, enrichments, and human review.\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Document Classification\n", "In this lab we will walk you through an hands-on lab on document classification using Amazon Comprehend\n", "Custom Classifier. We will use Amazon Textract to first extract the text out of our documents and then label them and then use the data for training our Amazon comprehend custom classifier. We will create an Amazon Comprehend real time endpoint with the custom classifier to classify our documents.\n", "\n", "\n", "
\n",
" \n",
"