### Amazon Textract Analyze ID

Amazon Textract Analyze ID will help you automatically extract information from identification documents, such as driver’s licenses and passports. Amazon Textract uses AI and ML technologies to extract information from identity documents, such as U.S. passports and driver’s licenses, without the need for templates or configuration. You can automatically extract specific information, such as date of expiry and date of birth, as well as intelligently identify and extract implied information, such as name and address.

Installing the caller to simplify calling Analyze ID

In [1]:
!python -m pip install -q amazon-textract-caller --upgrade

Also upgrade boto3 to make sure we are on the latest boto3 that includes Analzye ID

In [2]:
!python -m pip install -q boto3 botocore --upgrade

In [3]:
import boto3
import botocore
from textractcaller import call_textract_analyzeid

The sample drivers license image is located in an S3 bucket in us-east-2, so we pass in that region to the boto3 client

In [5]:
textract_client = boto3.client('textract', region_name='us-east-2')
j = call_textract_analyzeid(document_pages=["s3://amazon-textract-public-content/analyzeid/driverlicense.png"], 
 boto3_textract_client=textract_client)

printing out the JSON response

In [6]:
import json
print(json.dumps(j, indent=2))

{
 "IdentityDocuments": [
 {
 "DocumentIndex": 1,
 "IdentityDocumentFields": [
 {
 "Type": {
 "Text": "FIRST_NAME"
 },
 "ValueDetection": {
 "Text": "JORGE",
 "Confidence": 98.78211975097656
 }
 },
 {
 "Type": {
 "Text": "LAST_NAME"
 },
 "ValueDetection": {
 "Text": "SOUZA",
 "Confidence": 98.82009887695312
 }
 },
 {
 "Type": {
 "Text": "MIDDLE_NAME"
 },
 "ValueDetection": {
 "Text": "",
 "Confidence": 99.39620208740234
 }
 },
 {
 "Type": {
 "Text": "SUFFIX"
 },
 "ValueDetection": {
 "Text": "",
 "Confidence": 99.65946960449219
 }
 },
 {
 "Type": {
 "Text": "CITY_IN_ADDRESS"
 },
 "ValueDetection": {
 "Text": "ANYTOWN",
 "Confidence": 98.8210220336914
 }
 },
 {
 "Type": {
 "Text": "ZIP_CODE_IN_ADDRESS"
 },
 "ValueDetection": {
 "Text": "02127",
 "Confidence": 99.0246353149414
 }
 },
 {
 "Type": {
 "Text": "STATE_IN_ADDRESS"
 },
 "ValueDetection": {
 "Text": "MA",
 "Confidence": 99.53130340576172
 }
 },
 {
 "Type": {
 "Text": "STATE_NAME"
 },
 "ValueDetection": {
 "Text": "MASSACHUSETTS"

Textract Response Parser makes it easier to get values from the JSON response

In [7]:
!python -m pip install -q amazon-textract-response-parser tabulate --upgrade

The get_values_as_list() function returns the values as a list of list of str in the following format
[["doc_number", "type", "value", "confidence", "normalized_value", "normalized_value_type"]]


In [11]:
import trp.trp2_analyzeid as t2id

doc: t2id.TAnalyzeIdDocument = t2id.TAnalyzeIdDocumentSchema().load(j)
result = doc.get_values_as_list()
result

[['1', 'FIRST_NAME', 'JORGE', '98.78211975097656', '', ''],
 ['1', 'LAST_NAME', 'SOUZA', '98.82009887695312', '', ''],
 ['1', 'MIDDLE_NAME', '', '99.39620208740234', '', ''],
 ['1', 'SUFFIX', '', '99.65946960449219', '', ''],
 ['1', 'CITY_IN_ADDRESS', 'ANYTOWN', '98.8210220336914', '', ''],
 ['1', 'ZIP_CODE_IN_ADDRESS', '02127', '99.0246353149414', '', ''],
 ['1', 'STATE_IN_ADDRESS', 'MA', '99.53130340576172', '', ''],
 ['1', 'STATE_NAME', 'MASSACHUSETTS', '98.22105407714844', '', ''],
 ['1', 'DOCUMENT_NUMBER', '820BAC729CBAC', '96.05117797851562', '', ''],
 ['1',
 'EXPIRATION_DATE',
 '01/20/2020',
 '98.38336944580078',
 '2020-01-20T00:00:00',
 'Date'],
 ['1',
 'DATE_OF_BIRTH',
 '03/18/1978',
 '98.17178344726562',
 '1978-03-18T00:00:00',
 'Date'],
 ['1', 'DATE_OF_ISSUE', '', '89.29450988769531', '', ''],
 ['1', 'ID_TYPE', 'DRIVER LICENSE FRONT', '98.81443786621094', '', ''],
 ['1', 'ENDORSEMENTS', 'NONE', '99.27168273925781', '', ''],
 ['1', 'VETERAN', '', '99.62979125976562', '', ''],

using tablulate we get a pretty printed output

In [13]:
from tabulate import tabulate
print(tabulate([x[1:3] for x in result]))

------------------- --------------------
FIRST_NAME JORGE
LAST_NAME SOUZA
MIDDLE_NAME
SUFFIX
CITY_IN_ADDRESS ANYTOWN
ZIP_CODE_IN_ADDRESS 02127
STATE_IN_ADDRESS MA
STATE_NAME MASSACHUSETTS
DOCUMENT_NUMBER 820BAC729CBAC
EXPIRATION_DATE 01/20/2020
DATE_OF_BIRTH 03/18/1978
DATE_OF_ISSUE
ID_TYPE DRIVER LICENSE FRONT
ENDORSEMENTS NONE
VETERAN
RESTRICTIONS NONE
CLASS D
ADDRESS 100 MAIN STREET
COUNTY
PLACE_OF_BIRTH
------------------- --------------------


Just getting the FIRST_NAME

In [14]:
[x[2] for x in result if x[1]=='FIRST_NAME']

['JORGE']