{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Onboarding Use Case (1:1 Verification)\n", "-----\n", "\n", "Here we are going to take a look at the Onboarding case. Here a user is asked to present an image of a trusted identification document, then snap a selfie image. The following diagram details the process. \n", "\n", "\n", "\n", "\n", "![Onboarding](onboarding_process.png \"Onboarding\")\n", "\n", "1. user presents an image of an identification document like a drivers license or passport\n", "2. user snaps a selfie which will be used to compare to the drivers license\n", "3. system detects a face in the identification document and performs quality checks \n", "4. system detects a face in the selfie and performs quality checks\n", "5. system compares the identification image to the selfie image \n", " - if the similarity is above the specified threshold then we say that the faces match \n", " - if the similarity is below the specified threshold then we say that the faces DON'T match\n", "6. system checks selfie against known onboarded users and fraudsters. \n", "7. indexes the user and the user is onboarded. \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import io\n", "import boto3\n", "import json\n", "from IPython.display import Image as IImage\n", "import pandas as pd\n", "\n", "%store -r bucket_name\n", "mySession = boto3.session.Session()\n", "aws_region = mySession.region_name\n", "print(\"AWS Region: {}\".format(aws_region))\n", "print(\"AWS Bucket: {}\".format(bucket_name))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup Clients \n", "-----\n", "Here we are going to use both S3 and Rekognition apis " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_client = boto3.client('s3')\n", "rek_client = boto3.client('rekognition')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 0. Setup collections \n", "-----\n", "Here we will setup two collections \"registered-users\" and \"fraudulent-users\". The collections will be used to ensure that the user hasn't previously registered and that the user is not a known fraudster. Here we'll quickly create the two collections and index a handful of images. \n", "\n", "
Avoiding Duplicates & Fraudsters \n", " \n", "In practice, we use collections for many purposes. One is to avoid duplicate registrations; Without some form of biometric based onboarding, it is pretty easy for a user to create multiple accounts on a system simply by using different email addresses. Using a collection prevents this, because you can simply search the collection to see if a user has previously onobarded. Collections have proven to be effective at identifying and dealing with duplicate onboarding attempts as well as identifying fraudsters and bad actors. \n", "\n", "
\n", "\n", "\n", "
Create two Collections \n", "\n", "Here we are going to create two collections \"idv-registered-users\" and \"idv-fraudulent-users\" we'll load both collections with some images, and in step 6 and 7 of our process we'll check that the user hasn't previously onboarded. \n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Name your Collection \n", "\n", "# -- onboarded users collection --\n", "collection_name = ' ' # name your collection something like \"onboarded-users\"\n", "\n", "try:\n", " rek_client.create_collection(\n", " CollectionId=collection_name\n", " )\n", "except Exception as err:\n", " print(\"ERROR: {}\".format(err))\n", "\n", "# -- fraudulent user collection --\n", "fraud_collection_name = ' ' # name your collection something like \"fraudulent-users\"\n", "\n", "try:\n", " rek_client.create_collection(\n", " CollectionId=fraud_collection_name\n", " )\n", "except Exception as err:\n", " print(\"ERROR: {}\".format(err))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- read the image map into a pandas dataframe --\n", "obj = s3_client.get_object(Bucket=bucket_name, Key='IDVImageMapping.xlsx')\n", "image_map = pd.read_excel(io.BytesIO(obj['Body'].read()))\n", "\n", "## Onboarded useres \n", "dict_of_faces = image_map[[\"reference_name\",\"reference_image\"]].sample(n=20).to_dict('records')\n", "\n", "for rec in dict_of_faces:\n", " \n", " response = rek_client.index_faces(\n", " CollectionId= collection_name,\n", " Image={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': rec[\"reference_image\"],\n", " }\n", " },\n", " ExternalImageId=rec['reference_name'],\n", " DetectionAttributes=[\n", " 'DEFAULT',\n", " ],\n", " MaxFaces=1, # maximum faces detected \n", " QualityFilter='AUTO' # apply the quality filter. \n", " )\n", " face_id = response['FaceRecords'][0]['Face']['FaceId']\n", " print(\"ImageName: {}, FaceID: {}\".format(rec[\"reference_image\"], face_id))\n", " \n", "\n", "print(\"--- indexing {} complete --- \\n\".format(collection_name))\n", "\n", "\n", "## Onboarded useres \n", "dict_of_faces = image_map[[\"reference_name\",\"reference_image\"]].sample(n=20).to_dict('records')\n", "\n", "for rec in dict_of_faces:\n", " \n", " response = rek_client.index_faces(\n", " CollectionId= fraud_collection_name,\n", " Image={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': rec[\"reference_image\"],\n", " }\n", " },\n", " ExternalImageId=rec['reference_name'],\n", " DetectionAttributes=[\n", " 'DEFAULT',\n", " ],\n", " MaxFaces=1, # maximum faces detected \n", " QualityFilter='AUTO' # apply the quality filter. \n", " )\n", " face_id = response['FaceRecords'][0]['Face']['FaceId']\n", " print(\"ImageName: {}, FaceID: {}\".format(rec[\"reference_image\"], face_id))\n", " \n", "\n", "print(\"--- indexing {} complete ---\\n\".format(fraud_collection_name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1. User captures image of identification document \n", "-----\n", "The image below include sample image of a RealID drivers license a " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Image of a Face\n", "id_image = \"jane_maria_sc_dl.jpeg\"\n", "display(IImage(url=s3_client.generate_presigned_url('get_object', \n", " Params={'Bucket': bucket_name, \n", " 'Key' : id_image})))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2. User snaps a selfie \n", "-----\n", "The image below is a sample selfie of the same person, we'll use this to perform our comparisons and searches" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Image of a Face\n", "selfie_image = \"JaneMarieSelfie.png\"\n", "display(IImage(url=s3_client.generate_presigned_url('get_object', \n", " Params={'Bucket': bucket_name, \n", " 'Key' : selfie_image})))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3. Check Identification Document \n", "-----\n", "\n", "Here we want to do some basic checks on the Identification Document :\n", "1. that we can detect a face in the drivers license \n", "2. the quality (sharpness and brightness) are sufficient to match with \n", "\n", "Note: we could do several other checks, but we'll see those in subsequent modules. \n", "\n", "
DetectFaces \n", " \n", "The **DetectFaces** operation that looks for key facial features such as eyes, nose, and mouth to detect faces in an input image. Amazon Rekognition Image detects the 100 largest faces in an image.\n", "\n", "Here we actually will detect two faces in the driver's license. \n", "\n", " \n", "The **DetectText** can be used to extract text from the drivers license as well. For our purposes we'll skip this step. \n", "\n", "
\n", "\n", "
Note \n", " \n", "Take a look at the FaceDetails of each face detected. it will return several helpful details of image quality for each face in the image. \n", "\n", "- BoundingBox\n", "- AgeRange\n", "- Gender\n", "- Landmarks\n", "- Quality\n", "- Pose \n", "\n", "
\n", "\n", "The drivers license image returns two faces, you can see the different faces using the following \n", "\n", "```python \n", "response['FaceDetails'][0]\n", "response['FaceDetails'][1]\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- drivers license -- \n", "response = rek_client.detect_faces(Image={'S3Object':{\n", " 'Bucket':bucket_name,\n", " 'Name':id_image}},\n", " Attributes=['ALL'])\n", "print(\"-- 1st face found:\")\n", "print(response['FaceDetails'][0]['BoundingBox'])\n", "print(response['FaceDetails'][0]['Quality'])\n", "\n", "print(\"-- 2nd face found:\")\n", "print(response['FaceDetails'][1]['BoundingBox'])\n", "print(response['FaceDetails'][1]['Quality'])\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4. Selfie Image quality checks\n", "----\n", "\n", "Here we want to do some basic checks:\n", "1. that we can detect a face in the drivers license and the selie \n", "2. the quality (sharpness and brightness) are sufficent to match with \n", "\n", "Note: we could do several other checks, but we'll see those in module 3.\n", "\n", "\n", "
DetectFaces \n", " \n", "The **DetectFaces** operation that looks for key facial features such as eyes, nose, and mouth to detect faces in an input image. Amazon Rekognition Image detects the 100 largest faces in an image.\n", "\n", "\n", "\n", "
\n", "\n", "
Note \n", " \n", "Take a look at the FaceDetails of the selfie and compare it to the face details on the drivers license. make note of the quality etc. \n", "\n", "- BoundingBox\n", "- AgeRange\n", "- Gender\n", "- Landmarks\n", "- Quality\n", "- Pose \n", "\n", "
\n", "\n", "The quality of the 1st face in the drivers license is different than the quality of the face in the selfie:\n", "\n", "```python \n", "# -- first in drivers license quality --\n", "'Quality': {'Brightness': 94.17919921875, 'Sharpness': 46.02980041503906}\n", " \n", "# -- selfie quality --\n", "'Quality': {'Brightness': 89.2042007446289, 'Sharpness': 53.330047607421875}\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# -- Selfie Image -- \n", "response = rek_client.detect_faces(Image={'S3Object':{\n", " 'Bucket':bucket_name,\n", " 'Name':selfie_image}},\n", " Attributes=['ALL'])\n", "print(response['FaceDetails'][0]['Quality'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5. Onboarding 1:1 Verification with CompareFaces\n", "-----\n", "\n", "Compares a face in the source input image with each of the 100 largest faces detected in the target input image. in our case the source image will be the drivers license and the target image will be the selfie. \n", "\n", "In response, the operation returns an array of face matches ordered by similarity score in descending order. For each face match, the response provides a bounding box of the face, facial landmarks, pose details (pitch, roll, and yaw), quality (brightness and sharpness), and confidence value (indicating the level of confidence that the bounding box contains a face). The response also provides a similarity score, which indicates how closely the faces match.\n", "\n", "\n", "
Note \n", " \n", "Take a look at the FaceDetails of each face detected. it will return several helpful details of image quality for each face in the image. \n", "\n", "- SimilarityThreshold - The minimum level of confidence in the face matches that a match must meet to be included in the FaceMatches array. \n", "\n", "- QualityFilter - A filter that specifies a quality bar for how much filtering is done to identify faces. Filtered faces aren't compared. If you specify AUTO, Amazon Rekognition chooses the quality bar. If you specify LOW, MEDIUM, or HIGH, filtering removes all faces that do not meet the chosen quality bar. \n", "\n", "
\n", "\n", "\n", "
Guidance \n", " \n", "For identity verification use cases, it is recommended to set the SimilarityThreshold (for CompareFaces API) and FaceMatchThreshold (for SearchFacesByImage API) parameters to a value of 99%. This will improve the face match precision, and reduce the chances of a false positive (false match). However, it may lower the recall. \n", "\n", "Additionally, setting the QualityFilter to HIGH will help eliminate low quality images from consideration.\n", "\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = rek_client.compare_faces(\n", " SourceImage={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': id_image,\n", " }\n", " },\n", " TargetImage={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': selfie_image,\n", " }\n", " },\n", " SimilarityThreshold = 99,\n", " QualityFilter='HIGH'\n", ")\n", " \n", "print(json.dumps(response, indent=3)) \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6. system checks selfie against known onboarded users \n", "----\n", "Here we simply search the onboarded user collection to see if the user has previously been onboarded. \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image_external_id = \"Sample_Jane_Maria\"\n", "\n", "response = rek_client.search_faces_by_image(\n", " CollectionId=collection_name,\n", " Image={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': selfie_image,\n", " }\n", " },\n", " MaxFaces=1,\n", " FaceMatchThreshold=90,\n", " QualityFilter='AUTO'\n", ")\n", "\n", "try:\n", " external_image_id = response[\"FaceMatches\"][0][\"Face\"][\"ExternalImageId\"]\n", " similarity_score = response[\"FaceMatches\"][0][\"Similarity\"]\n", " face_id = response[\"FaceMatches\"][0][\"Face\"][\"FaceId\"]\n", " print(\"-- Face Found in : {}\".format(collection_name))\n", " print(\"-- FaceID : {} \".format(face_id))\n", " print(\"-- Similarity : {} \".format(similarity_score))\n", " print(\"-- ExternalImageID : {} \\n\".format(external_image_id))\n", "except:\n", " print(\"-- Face NOT Found in : {}\".format(collection_name))\n", " print(json.dumps(response, indent=3)) \n", "\n", "\n", "response = rek_client.search_faces_by_image(\n", " CollectionId=fraud_collection_name,\n", " Image={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': selfie_image,\n", " }\n", " },\n", " MaxFaces=1,\n", " FaceMatchThreshold=90,\n", " QualityFilter='AUTO'\n", ")\n", "\n", "try:\n", " external_image_id = response[\"FaceMatches\"][0][\"Face\"][\"ExternalImageId\"]\n", " similarity_score = response[\"FaceMatches\"][0][\"Similarity\"]\n", " face_id = response[\"FaceMatches\"][0][\"Face\"][\"FaceId\"]\n", " print(\"-- Face Found in : {}\".format(fraud_collection_name))\n", " print(\"-- FaceID : {} \".format(face_id))\n", " print(\"-- Similarity : {} \".format(similarity_score))\n", " print(\"-- ExternalImageID : {} \\n\".format(external_image_id))\n", "\n", "except:\n", " print(\"-- Face NOT Found in : {}\".format(fraud_collection_name))\n", " print(json.dumps(response, indent=3)) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7. system indexes user's trusted document and potentially selfie \n", "----\n", "Here we simply index the faces on the drivers license and optionally index the selfie image as well. Why index both? well typically the image on the drivers license should be somewhat older than the selfie and having two images is better than a single when searching. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Index a drivers license face \n", "print(\"indexing drivers license\")\n", "response = rek_client.index_faces(\n", " CollectionId=collection_name,\n", " Image={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': selfie_image,\n", " }\n", " },\n", " ExternalImageId=image_external_id,\n", " DetectionAttributes=[\n", " 'DEFAULT',\n", " ],\n", " MaxFaces=1, # how many faces to detect \n", " QualityFilter='HIGH' # apply quality filter \n", ")\n", "\n", "try:\n", " face_id = response['FaceRecords'][0]['Face']['FaceId']\n", " print(\"-- FaceID : {} --\\n\".format(face_id))\n", " print(json.dumps(response, indent=3))\n", "except:\n", " print(json.dumps(response, indent=3))\n", " \n", "print(\"indexing selfie image \")\n", "# Index a selfie image\n", "response = rek_client.index_faces(\n", " CollectionId=collection_name,\n", " Image={\n", " 'S3Object': {\n", " 'Bucket': bucket_name,\n", " 'Name': selfie_image,\n", " }\n", " },\n", " ExternalImageId=image_external_id,\n", " DetectionAttributes=[\n", " 'DEFAULT',\n", " ],\n", " MaxFaces=1, # how many faces to detect \n", " QualityFilter='HIGH' # apply quality filter \n", ")\n", "\n", "try:\n", " face_id = response['FaceRecords'][0]['Face']['FaceId']\n", " print(\"-- FaceID : {} --\\n\".format(face_id))\n", " print(json.dumps(response, indent=3))\n", "except:\n", " print(json.dumps(response, indent=3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean up\n", "----\n", "As part of our cleanup, we can delete our two collections. This will delete the collections and all the face vectors contained within.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# onboarded users collection\n", "\n", "try:\n", " rek_client.delete_collection(\n", " CollectionId=collection_name\n", " )\n", "\n", "except Exception as err:\n", " print(\"ERROR: {}\".format(err))\n", " \n", "# fraudulent user collection \n", "\n", "try:\n", " rek_client.delete_collection(\n", " CollectionId=fraud_collection_name\n", " )\n", "except Exception as err:\n", " print(\"ERROR: {}\".format(err))" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 }