{ "cells": [ { "cell_type": "markdown", "id": "602739d2", "metadata": {}, "source": [ "### Amazon Textract Analyze ID\n", "\n", "Amazon Textract Analyze ID will help you automatically extract information from identification documents, such as driver’s licenses and passports. Amazon Textract uses AI and ML technologies to extract information from identity documents, such as U.S. passports and driver’s licenses, without the need for templates or configuration. You can automatically extract specific information, such as date of expiry and date of birth, as well as intelligently identify and extract implied information, such as name and address." ] }, { "cell_type": "markdown", "id": "f1cc2940", "metadata": {}, "source": [ "Installing the caller to simplify calling Analyze ID" ] }, { "cell_type": "code", "execution_count": 1, "id": "107b34fb", "metadata": {}, "outputs": [], "source": [ "!python -m pip install -q amazon-textract-caller --upgrade" ] }, { "cell_type": "markdown", "id": "10c0b980", "metadata": {}, "source": [ "Also upgrade boto3 to make sure we are on the latest boto3 that includes Analzye ID" ] }, { "cell_type": "code", "execution_count": 2, "id": "cc280ce2", "metadata": {}, "outputs": [], "source": [ "!python -m pip install -q boto3 botocore --upgrade" ] }, { "cell_type": "code", "execution_count": 3, "id": "cd3d8238", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import botocore\n", "from textractcaller import call_textract_analyzeid" ] }, { "cell_type": "markdown", "id": "5cb62607", "metadata": {}, "source": [ "The sample drivers license image is located in an S3 bucket in us-east-2, so we pass in that region to the boto3 client" ] }, { "cell_type": "code", "execution_count": 5, "id": "f85fc212", "metadata": {}, "outputs": [], "source": [ "textract_client = boto3.client('textract', region_name='us-east-2')\n", "j = call_textract_analyzeid(document_pages=[\"s3://amazon-textract-public-content/analyzeid/driverlicense.png\"], \n", " boto3_textract_client=textract_client)" ] }, { "cell_type": "markdown", "id": "ee7dc4e8", "metadata": {}, "source": [ "printing out the JSON response" ] }, { "cell_type": "code", "execution_count": 6, "id": "d5417d43", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"IdentityDocuments\": [\n", " {\n", " \"DocumentIndex\": 1,\n", " \"IdentityDocumentFields\": [\n", " {\n", " \"Type\": {\n", " \"Text\": \"FIRST_NAME\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"JORGE\",\n", " \"Confidence\": 98.78211975097656\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"LAST_NAME\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"SOUZA\",\n", " \"Confidence\": 98.82009887695312\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"MIDDLE_NAME\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"\",\n", " \"Confidence\": 99.39620208740234\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"SUFFIX\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"\",\n", " \"Confidence\": 99.65946960449219\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"CITY_IN_ADDRESS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"ANYTOWN\",\n", " \"Confidence\": 98.8210220336914\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"ZIP_CODE_IN_ADDRESS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"02127\",\n", " \"Confidence\": 99.0246353149414\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"STATE_IN_ADDRESS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"MA\",\n", " \"Confidence\": 99.53130340576172\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"STATE_NAME\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"MASSACHUSETTS\",\n", " \"Confidence\": 98.22105407714844\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"DOCUMENT_NUMBER\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"820BAC729CBAC\",\n", " \"Confidence\": 96.05117797851562\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"EXPIRATION_DATE\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"01/20/2020\",\n", " \"NormalizedValue\": {\n", " \"Value\": \"2020-01-20T00:00:00\",\n", " \"ValueType\": \"Date\"\n", " },\n", " \"Confidence\": 98.38336944580078\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"DATE_OF_BIRTH\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"03/18/1978\",\n", " \"NormalizedValue\": {\n", " \"Value\": \"1978-03-18T00:00:00\",\n", " \"ValueType\": \"Date\"\n", " },\n", " \"Confidence\": 98.17178344726562\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"DATE_OF_ISSUE\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"\",\n", " \"Confidence\": 89.29450988769531\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"ID_TYPE\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"DRIVER LICENSE FRONT\",\n", " \"Confidence\": 98.81443786621094\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"ENDORSEMENTS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"NONE\",\n", " \"Confidence\": 99.27168273925781\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"VETERAN\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"\",\n", " \"Confidence\": 99.62979125976562\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"RESTRICTIONS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"NONE\",\n", " \"Confidence\": 99.41033935546875\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"CLASS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"D\",\n", " \"Confidence\": 99.05763244628906\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"ADDRESS\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"100 MAIN STREET\",\n", " \"Confidence\": 99.24053192138672\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"COUNTY\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"\",\n", " \"Confidence\": 99.59503173828125\n", " }\n", " },\n", " {\n", " \"Type\": {\n", " \"Text\": \"PLACE_OF_BIRTH\"\n", " },\n", " \"ValueDetection\": {\n", " \"Text\": \"\",\n", " \"Confidence\": 99.64707946777344\n", " }\n", " }\n", " ]\n", " }\n", " ],\n", " \"DocumentMetadata\": {\n", " \"Pages\": 1\n", " },\n", " \"AnalyzeIDModelVersion\": \"1.0\",\n", " \"ResponseMetadata\": {\n", " \"RequestId\": \"e7437df8-5c35-47a3-a24d-ee8436f18d1d\",\n", " \"HTTPStatusCode\": 200,\n", " \"HTTPHeaders\": {\n", " \"x-amzn-requestid\": \"e7437df8-5c35-47a3-a24d-ee8436f18d1d\",\n", " \"content-type\": \"application/x-amz-json-1.1\",\n", " \"content-length\": \"2223\",\n", " \"date\": \"Fri, 03 Dec 2021 18:56:24 GMT\"\n", " },\n", " \"RetryAttempts\": 0\n", " }\n", "}\n" ] } ], "source": [ "import json\n", "print(json.dumps(j, indent=2))" ] }, { "cell_type": "markdown", "id": "c2a00b42", "metadata": {}, "source": [ "Textract Response Parser makes it easier to get values from the JSON response" ] }, { "cell_type": "code", "execution_count": 7, "id": "e8947a56", "metadata": {}, "outputs": [], "source": [ "!python -m pip install -q amazon-textract-response-parser tabulate --upgrade" ] }, { "cell_type": "markdown", "id": "a2f8e820", "metadata": {}, "source": [ "The get_values_as_list() function returns the values as a list of list of str in the following format\n", "[[\"doc_number\", \"type\", \"value\", \"confidence\", \"normalized_value\", \"normalized_value_type\"]]\n" ] }, { "cell_type": "code", "execution_count": 11, "id": "e4b8c205", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['1', 'FIRST_NAME', 'JORGE', '98.78211975097656', '', ''],\n", " ['1', 'LAST_NAME', 'SOUZA', '98.82009887695312', '', ''],\n", " ['1', 'MIDDLE_NAME', '', '99.39620208740234', '', ''],\n", " ['1', 'SUFFIX', '', '99.65946960449219', '', ''],\n", " ['1', 'CITY_IN_ADDRESS', 'ANYTOWN', '98.8210220336914', '', ''],\n", " ['1', 'ZIP_CODE_IN_ADDRESS', '02127', '99.0246353149414', '', ''],\n", " ['1', 'STATE_IN_ADDRESS', 'MA', '99.53130340576172', '', ''],\n", " ['1', 'STATE_NAME', 'MASSACHUSETTS', '98.22105407714844', '', ''],\n", " ['1', 'DOCUMENT_NUMBER', '820BAC729CBAC', '96.05117797851562', '', ''],\n", " ['1',\n", " 'EXPIRATION_DATE',\n", " '01/20/2020',\n", " '98.38336944580078',\n", " '2020-01-20T00:00:00',\n", " 'Date'],\n", " ['1',\n", " 'DATE_OF_BIRTH',\n", " '03/18/1978',\n", " '98.17178344726562',\n", " '1978-03-18T00:00:00',\n", " 'Date'],\n", " ['1', 'DATE_OF_ISSUE', '', '89.29450988769531', '', ''],\n", " ['1', 'ID_TYPE', 'DRIVER LICENSE FRONT', '98.81443786621094', '', ''],\n", " ['1', 'ENDORSEMENTS', 'NONE', '99.27168273925781', '', ''],\n", " ['1', 'VETERAN', '', '99.62979125976562', '', ''],\n", " ['1', 'RESTRICTIONS', 'NONE', '99.41033935546875', '', ''],\n", " ['1', 'CLASS', 'D', '99.05763244628906', '', ''],\n", " ['1', 'ADDRESS', '100 MAIN STREET', '99.24053192138672', '', ''],\n", " ['1', 'COUNTY', '', '99.59503173828125', '', ''],\n", " ['1', 'PLACE_OF_BIRTH', '', '99.64707946777344', '', '']]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import trp.trp2_analyzeid as t2id\n", "\n", "doc: t2id.TAnalyzeIdDocument = t2id.TAnalyzeIdDocumentSchema().load(j)\n", "result = doc.get_values_as_list()\n", "result" ] }, { "cell_type": "markdown", "id": "cb963302", "metadata": {}, "source": [ "using tablulate we get a pretty printed output" ] }, { "cell_type": "code", "execution_count": 13, "id": "6c4fdfef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "------------------- --------------------\n", "FIRST_NAME JORGE\n", "LAST_NAME SOUZA\n", "MIDDLE_NAME\n", "SUFFIX\n", "CITY_IN_ADDRESS ANYTOWN\n", "ZIP_CODE_IN_ADDRESS 02127\n", "STATE_IN_ADDRESS MA\n", "STATE_NAME MASSACHUSETTS\n", "DOCUMENT_NUMBER 820BAC729CBAC\n", "EXPIRATION_DATE 01/20/2020\n", "DATE_OF_BIRTH 03/18/1978\n", "DATE_OF_ISSUE\n", "ID_TYPE DRIVER LICENSE FRONT\n", "ENDORSEMENTS NONE\n", "VETERAN\n", "RESTRICTIONS NONE\n", "CLASS D\n", "ADDRESS 100 MAIN STREET\n", "COUNTY\n", "PLACE_OF_BIRTH\n", "------------------- --------------------\n" ] } ], "source": [ "from tabulate import tabulate\n", "print(tabulate([x[1:3] for x in result]))" ] }, { "cell_type": "markdown", "id": "2d09fc61", "metadata": {}, "source": [ "Just getting the FIRST_NAME" ] }, { "cell_type": "code", "execution_count": 14, "id": "3730f49e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['JORGE']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[x[2] for x in result if x[1]=='FIRST_NAME']" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }