{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5c5d05b8-dc5a-45ab-92b2-dc94ff5a9c9e",
   "metadata": {},
   "source": [
    "# Improving face match accuracy with Amazon Rekognition"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3739f352-bdb0-4242-b740-6fe73650dfd9",
   "metadata": {},
   "source": [
    "In this lab we will see how creating collections with multiple profiles of faces leads to better face match accuracy than capturing a single face profile. In real life scenarios, selfies may have poor lighting or camera quality, which can affect face match accuracy. For this reason we encourage our customers to take multiple selfies, which will improve the accuracy. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b5d77ae-c627-4509-a0d1-790175cdd2fb",
   "metadata": {},
   "source": [
    "## Steps"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f642a4fe-2ad5-47ca-8e66-97acb38143b9",
   "metadata": {},
   "source": [
    "These are the following steps we are going to accomplish:\n",
    "- **Step 0 - Load libraries**\n",
    "- **Step 1 - List existing collections**\n",
    "- **Step 2 - Create collections**\n",
    "- **Step 3 - Populate collections**\n",
    "- **Step 4 - Compare faces against both collections**\n",
    "- **Step 5 - Review results**\n",
    "- **Step 6 - Clean up resources**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d374757-07d4-4c12-93b1-b7885a074599",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Step 0 - Load libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3187d076-6619-4bbe-9891-4b248f298685",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -qU opencv-python-headless\n",
    "import boto3, os, io, glob, cv2\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline \n",
    "client=boto3.client('rekognition')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f289628-f8ad-487a-b27f-088470f785ca",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Step 1 - List existing collections "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0da008e-34cf-4eec-ab1c-fd945c6ca512",
   "metadata": {},
   "source": [
    "Before we create new collections, let's have a look if there are any existing collections in our account."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb26fb06-05e9-4908-8e5d-c3b9a586f14b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def list_collections():\n",
    "\n",
    "    max_results=10\n",
    "    \n",
    "    print('Displaying collections...')\n",
    "    response=client.list_collections(MaxResults=max_results)\n",
    "    collection_count=0\n",
    "    done=False\n",
    "    \n",
    "    while not done:\n",
    "        collections=response['CollectionIds']\n",
    "\n",
    "        for collection in collections:\n",
    "            print (collection)\n",
    "            collection_count+=1\n",
    "        if 'NextToken' in response:\n",
    "            nextToken=response['NextToken']\n",
    "            response=client.list_collections(NextToken=nextToken,MaxResults=max_results)\n",
    "            \n",
    "        else:\n",
    "            done=True\n",
    "\n",
    "    return collection_count   \n",
    "\n",
    "collection_count=list_collections()\n",
    "\n",
    "print(\"There are: {} collections in your account \".format(collection_count))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5dea9be7-ce31-44aa-b1f2-c7da6d13867d",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Step 2 - Create new collections"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ebfdd26-dcd9-49db-a7db-d426377b4d85",
   "metadata": {},
   "source": [
    "In this section we will create two collections, in order to compare results when there is a single face in the collection versus multiple faces profiles."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f3b4fb5-873a-40c7-ac19-3bce81e9fd66",
   "metadata": {},
   "source": [
    "| ⚠️ WARNING: Assign a unique name for your collections inside the quotes in the next cell |\n",
    "| -- |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b272ee6d-893c-4b5e-b7f6-133d16718679",
   "metadata": {},
   "outputs": [],
   "source": [
    "collection_A='' # PROVIDE A UNIQUE NAME FOR COLLECTION A\n",
    "collection_B='' # PROVIDE A UNIQUE NAME FOR COLLECTION B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e1618b0-9ada-44b3-a1eb-4dca4b781572",
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_collection(collection_id):\n",
    "    #Create a collection\n",
    "    print('Creating collection:' + collection_id)\n",
    "    try:\n",
    "        response=client.create_collection(CollectionId=collection_id)\n",
    "        print('Collection ARN: ' + response['CollectionArn'])\n",
    "        print('Status code: ' + str(response['StatusCode']))\n",
    "        print('Done.')\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "create_collection(collection_A)\n",
    "create_collection(collection_B)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "907289eb-3abe-454e-9c05-d615f8326f5f",
   "metadata": {},
   "source": [
    "### Step 2a - Confirm your collections creation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ab0459b-3d36-44f6-8956-696bedef3878",
   "metadata": {},
   "outputs": [],
   "source": [
    "collection_count=list_collections()\n",
    "print(\"There are: {} collections in your account \".format(collection_count))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24b5cb02-83d5-4f2b-a53b-dc6102c979c8",
   "metadata": {},
   "source": [
    "## Step 3 - Populate collections "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd4576a2-1a44-462c-b85a-3bac1a7a9c61",
   "metadata": {},
   "source": [
    "### Step 3.a - Populate collection A"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b75aa175-57bc-49ad-a64c-74ca92611a2e",
   "metadata": {},
   "source": [
    "First we will index a single face into collection A. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48c4866c-8fcb-48a4-be05-7c7858a1a681",
   "metadata": {},
   "outputs": [],
   "source": [
    "def populate_collection(collection, directory):\n",
    "    for filename in os.listdir(directory):\n",
    "        f = os.path.join(directory, filename)\n",
    "        # checking if it is a file\n",
    "        if os.path.isfile(f):\n",
    "            print(f)\n",
    "            file = open(f, \"rb\") # opening for [r]eading as [b]inary\n",
    "            data = file.read() \n",
    "            response=client.index_faces(CollectionId=collection,\n",
    "                                Image={'Bytes':data},\n",
    "                                ExternalImageId=f.split(\"/\")[2],\n",
    "                                MaxFaces=1,\n",
    "                                QualityFilter=\"AUTO\",\n",
    "                                DetectionAttributes=['ALL'])\n",
    "            print ('Results for ' + f.split(\"/\")[2])\n",
    "            print('Faces indexed:')\n",
    "            for faceRecord in response['FaceRecords']:\n",
    "                print('  Face ID : {}'.format( faceRecord['Face']['FaceId']))\n",
    "                print('  Location: {}'.format(faceRecord['Face']['BoundingBox']))\n",
    "\n",
    "            if len(response['UnindexedFaces']) > 0:\n",
    "                print('Faces not indexed:')\n",
    "                for unindexed_face in response['UnindexedFaces']:\n",
    "                    print(' Location: {}'.format(unindexed_face['FaceDetail']['BoundingBox']))\n",
    "                    print(' Reasons :')\n",
    "                    for reason in unindexed_face['Reasons']:\n",
    "                        print('   ' + reason)\n",
    "            file.close()\n",
    "    return"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6e345de-c08d-46de-9696-48dc00bf3a80",
   "metadata": {},
   "outputs": [],
   "source": [
    "directory_A = 'media/single-profile'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46c9eeac-c224-4f3a-8189-61828d843b68",
   "metadata": {},
   "outputs": [],
   "source": [
    "populate_collection(collection_A,directory_A) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12a84d78-4648-41d0-8c95-960d56b031b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "img = cv2.imread(\"media/single-profile/dani1.jpg\")[:,:,::-1]\n",
    "fig = plt.figure(figsize=(10, 7))\n",
    "fig.add_subplot(1, 4, 1)\n",
    "plt.imshow(img)\n",
    "plt.axis('off')\n",
    "plt.title(\"dani1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbb1186c-05b6-4d91-8c37-65af1d51959d",
   "metadata": {},
   "source": [
    "### Step 3.b - Populate collection B"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "012d9588-3c7d-4fdd-b009-0c0eab07451e",
   "metadata": {},
   "source": [
    "Now we will populate B with multiple face profiles. Having multiple images of the same person should improve the face match results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2995168-9830-4977-abd7-03b3ab459f37",
   "metadata": {},
   "outputs": [],
   "source": [
    "directory_B = 'media/multiple-profiles'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa804ebc-1fa0-4867-b08a-d1b9d3805b21",
   "metadata": {},
   "outputs": [],
   "source": [
    "populate_collection(collection_B,directory_B) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "428e8c44-a3a7-4310-b8e0-8b13a385c23f",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(10, 7))\n",
    "images = glob.glob(\"media/multiple-profiles/*.jpg\")\n",
    "for idx, image in enumerate(images):\n",
    "    img = cv2.imread(image)[:,:,::-1]\n",
    "    fig.add_subplot(1, len(images), idx+1)\n",
    "    plt.imshow(img)\n",
    "    plt.axis('off')\n",
    "    plt.title(image.split(\"/\")[-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36617250-4625-487f-8076-355d83a23fbc",
   "metadata": {},
   "source": [
    "## Step 4 - Search faces against both collections"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ca93b18-d5ae-4bf3-b174-173b38b0bb1d",
   "metadata": {},
   "source": [
    "Now we will search the same photo against both collections to compare the similarity confidence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02ff1719-5503-4445-9c3c-d102adf7ec8c",
   "metadata": {},
   "outputs": [],
   "source": [
    "file = open(\"media/test-images/test1.jpg\", \"rb\") # opening for [r]eading as [b]inary\n",
    "data = file.read() \n",
    "img = cv2.imread(\"media/test-images/test1.jpg\")[:,:,::-1]\n",
    "fig = plt.figure(figsize=(10, 7))\n",
    "fig.add_subplot(1, 4, 1)\n",
    "plt.imshow(img)\n",
    "plt.axis('off')\n",
    "plt.title(\"test\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af37266e-6124-4ce7-ae3e-b7e404f09e2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def search_face(data, collection):\n",
    "    searchresults = client.search_faces_by_image(CollectionId=collection,\n",
    "                                                    Image={'Bytes':data},\n",
    "                                                    FaceMatchThreshold=50)\n",
    "    return searchresults"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ebb453c-9e93-43ca-9812-c232db7f94f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "searchA = search_face(data,collection_A)\n",
    "searchB = search_face(data,collection_B)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b8f0167-37e7-49c2-bbd2-8ad03bca1357",
   "metadata": {},
   "source": [
    "## Step 5 - Review results"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afbd4a16-b869-474b-af03-a95da27cd5bd",
   "metadata": {},
   "source": [
    "Let's have a look at the results of the search against collection A (one profile)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "259bb106-139f-4944-93ec-32752c755d0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(searchA)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70ea0924-84d9-403c-883e-1e6cf653cd32",
   "metadata": {},
   "source": [
    "Now let's see the results of the search against collection B (multiple profiles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04f019ff-d90c-4369-ad76-bcdd47276ddb",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(searchB)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75e4c7c3-70d1-4e2c-8587-d49ddbd5e2f1",
   "metadata": {},
   "source": [
    "Let's compare the best similarities, when we use multiple profile pictures the face similarity score improves. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb2edeb9-f038-451d-9344-4f893bb39087",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Using a single face profile in a collection, the similarity score is {}\".format(searchA[\"FaceMatches\"][0][\"Similarity\"]))\n",
    "print(\"Using multiple face profiles in a collection, the best similarity score is {}\".format(searchB[\"FaceMatches\"][0][\"Similarity\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6c19667-b910-4066-a268-29912b1e5c45",
   "metadata": {},
   "source": [
    "## Step 6 - Clean up resources"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26d45f7f-0813-4db2-aa53-88858cd6ab31",
   "metadata": {},
   "source": [
    "Let's delete the collections we created in our account."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "107ef3fc-b26f-46a4-87b7-2fb7efdf0041",
   "metadata": {},
   "outputs": [],
   "source": [
    "def delete_collection(collection_id):\n",
    "\n",
    "    print('Attempting to delete collection ' + collection_id)\n",
    "    status_code=0\n",
    "    try:\n",
    "        response=client.delete_collection(CollectionId=collection_id)\n",
    "        status_code=response['StatusCode']\n",
    "        \n",
    "    except ClientError as e:\n",
    "        if e.response['Error']['Code'] == 'ResourceNotFoundException':\n",
    "            print ('The collection ' + collection_id + ' was not found ')\n",
    "        else:\n",
    "            print ('Error other than Not Found occurred: ' + e.response['Error']['Message'])\n",
    "        status_code=e.response['ResponseMetadata']['HTTPStatusCode']\n",
    "    print('Status code: ' + str(status_code))\n",
    "\n",
    "\n",
    "delete_collection(collection_A)\n",
    "delete_collection(collection_B)\n"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}