{ "cells": [ { "cell_type": "markdown", "id": "29930c91-69fc-464d-8e20-ee5e15ff432c", "metadata": {}, "source": [ "# 0. ReIndex Solution Data Preparation" ] }, { "cell_type": "markdown", "id": "1a378685-2210-4f35-9388-56ef88eec560", "metadata": {}, "source": [ "This notebook contains scripts to help you create you face collection data in the expected format for the reindexing solution.\n", "\n", "**Notebook Steps:**\n", "1. Import libraries and clients\n", "2. Define variables\n", "3. Create a new collection and S3 bucket\n", "4. Sort old collection faces\n", "5. Modify getAdditionalInfo() to retrieve the original face images and userId\n", "6. Send records to Amazon S3\n", "\n", "Data records must follow this structure for the solution to work:\n", "```\n", "[ \n", " {\n", " \"Bucket\": String,\n", " \"Key\": String,\n", " \"ExternalImageId\": String,\n", " \"CollectionId\": String,\n", " \"Faces\": [\n", " {\n", " \"UserId\": String, #Optional\n", " \"FaceId\": String,\n", " \"ImageId\": String,\n", " \"BoundingBoxes\": {\n", " \"Width\": Float,\n", " \"Height\": Float,\n", " \"Left\": Float,\n", " \"Top\": Float\n", " }\n", " }\n", " ]\n", " }\n", "] \n", "```\n" ] }, { "cell_type": "markdown", "id": "aef4490b-0e51-4777-a0c8-0e6ceb6475b5", "metadata": {}, "source": [ "### 1. Import libraries and clients" ] }, { "cell_type": "code", "execution_count": null, "id": "f15e97f7-722e-4ea2-a894-049059ffa014", "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3, json, os, logging, sagemaker, time\n", "from botocore.exceptions import ClientError\n", "s3Resource = boto3.resource('s3')\n", "rekClient = boto3.client('rekognition')\n", "sm_session = sagemaker.Session()\n", "boto3_session = boto3.session.Session()\n", "boto3_region = boto3_session.region_name" ] }, { "cell_type": "markdown", "id": "feef944d-9b7f-4d70-be14-06a30fdbf758", "metadata": {}, "source": [ "### 2. Define variables" ] }, { "cell_type": "code", "execution_count": null, "id": "96267673-08dd-4161-acde-b6f5bef4a73f", "metadata": { "tags": [] }, "outputs": [], "source": [ "s3_bucket_name = \"\" # Specify a unique name for a new S3 bucket\n", "old_CollectionId = \"\" # Specify the name of your old collection\n", "new_CollectionId = \"\" # Specify a unique ID for your new collection" ] }, { "cell_type": "markdown", "id": "d9e3001b-38d7-49fc-b8e7-9fbadde6ebd3", "metadata": {}, "source": [ "### 3. Create a new collection and S3 bucket" ] }, { "cell_type": "markdown", "id": "bb03ada8-8191-4033-95aa-4b8d2b7053e6", "metadata": {}, "source": [ "#### Helper functions" ] }, { "cell_type": "code", "execution_count": null, "id": "4f050716-6806-4c8f-b6f8-48b508c029a1", "metadata": { "tags": [] }, "outputs": [], "source": [ "def createCollection(collectionid):\n", " response = rekClient.create_collection(\n", " CollectionId=collectionid\n", " )\n", " print(response)\n", " \n", "def list_collection_faces(collection_id):\n", " response = rekClient.list_faces(\n", " CollectionId=collection_id\n", " )\n", " print(\"Number Faces in {} is : {} \".format(collection_id, len(response[\"Faces\"]))) \n", "\n", "def createS3bucket(s3_bucket_name):\n", " if (boto3_region == \"us-east-1\"):\n", " s3Resource.create_bucket(Bucket=s3_bucket_name)\n", " else:\n", " s3Resource.create_bucket(Bucket=s3_bucket_name, CreateBucketConfiguration={'LocationConstraint': boto3_region})" ] }, { "cell_type": "code", "execution_count": null, "id": "ab1b8d71-48ed-4e8b-ae12-da1b8d714d22", "metadata": { "tags": [] }, "outputs": [], "source": [ "createS3bucket(s3_bucket_name) # Create a new bucket" ] }, { "cell_type": "code", "execution_count": null, "id": "4d291c98-a968-42d7-b911-debda4d89ec8", "metadata": { "tags": [] }, "outputs": [], "source": [ "createCollection(new_CollectionId) # Create the new collection" ] }, { "cell_type": "code", "execution_count": null, "id": "4194400d-cfb5-4144-a5a6-f137613cb086", "metadata": { "tags": [] }, "outputs": [], "source": [ "list_collection_faces(old_CollectionId)\n", "list_collection_faces(new_CollectionId)" ] }, { "cell_type": "markdown", "id": "87ea000e-ff95-466d-8666-2cb1d4b5949e", "metadata": {}, "source": [ "### 4. Sort old collection faces" ] }, { "cell_type": "code", "execution_count": null, "id": "3d894650-704f-4fdf-9372-867cd424560a", "metadata": { "tags": [] }, "outputs": [], "source": [ "old_collection_faces = rekClient.list_faces(\n", " CollectionId=old_CollectionId\n", ")\n", "\n", "sorted_faces = sorted(old_collection_faces[\"Faces\"], key=lambda d: d['ImageId']) \n", "print(\"Old Faces:\",len(sorted_faces))" ] }, { "cell_type": "markdown", "id": "e748429b-601f-481c-b056-b050962ab4c1", "metadata": {}, "source": [ "You will need to implement the code for the getAdditionalInfo function to retrieve the Bucket and Key of the image used for the face index. If you also have mapped an internal userId you can also include it optionally." ] }, { "cell_type": "markdown", "id": "6283bef4-3156-4044-a583-742ecd7c1f59", "metadata": { "tags": [] }, "source": [ "### 5. Modify getAdditionalInfo() to retrieve the original face images and userId" ] }, { "cell_type": "markdown", "id": "1d9d9754-88f1-44a6-a1e2-1063232d1b85", "metadata": { "tags": [] }, "source": [ "You will have to provide the bucket and key of the images that were used to index the original face collection. This data is needed to reindex the collection. \n", "\n", "Modify getAdditionalInfo() to return the following metadata for a face to associated with faces indexed:\n", "\n", "- image_storage_bucket - bucket where face images are stored\n", "- image_key - the object / file name of the image in the image_storage_bucket\n", "- userID (optional - if you have a defined ID for each user)" ] }, { "cell_type": "code", "execution_count": null, "id": "229bb6ec-4121-4065-9485-329e7bdfd8e5", "metadata": { "tags": [] }, "outputs": [], "source": [ "def getAdditionalInfo(faceid):\n", " #Retrieve userID,image_storage_bucket,image_key from internal mapping\n", " # Feel free to modify the function and expect other inputs such as ImageId or ExternalImageId if useful.\n", " return userID,image_storage_bucket,image_key" ] }, { "cell_type": "code", "execution_count": null, "id": "1613db32-e282-4259-a6e1-b6e32d1cce78", "metadata": {}, "outputs": [], "source": [ "def createDataset(sorted_faces):\n", " solution_records = []\n", " face = sorted_faces[0]\n", " position = 0\n", " previousImageId = face[\"ImageId\"]\n", " userID,image_storage_bucket,image_key = getAdditionalInfo(face[\"FaceId\"])\n", " #userID,image_storage_bucket,image_key = \"\",\"photos-bucket-name\",face[\"ExternalImageId\"] #Delete this if you can call getAdditionalInfo\n", "\n", " imagerecord = {\n", " \"Bucket\":image_storage_bucket,\n", " \"Key\":image_key,\n", " \"ExternalImageId\":face[\"ExternalImageId\"],\n", " \"CollectionId\":new_CollectionId, \n", " \"Faces\":[\n", " {\n", " \"UserId\":userID,\n", " \"FaceId\":face[\"FaceId\"],\n", " \"ImageId\":face[\"ImageId\"],\n", " \"BoundingBoxes\":face[\"BoundingBox\"]\n", " }\n", " ]\n", " }\n", " \n", " solution_records.append(imagerecord)\n", "\n", " for face in sorted_faces[1:]: \n", " if face[\"ImageId\"] != previousImageId:\n", " previousImageId = face[\"ImageId\"]\n", " userID,image_storage_bucket,image_key = getAdditionalInfo(face[\"FaceId\"])\n", " #userID,image_storage_bucket,image_key = \"\",\"photos-bucket-name\",face[\"ExternalImageId\"] #Delete this if you can call getAdditionalInfo\n", " position = position + 1\n", " imagerecord = {\n", " \"Bucket\":image_storage_bucket,\n", " \"Key\":image_key,\n", " \"ExternalImageId\":face[\"ExternalImageId\"],\n", " \"CollectionId\":new_CollectionId, \n", " \"Faces\":[\n", " {\n", " \"UserId\":userID,\n", " \"FaceId\":face[\"FaceId\"],\n", " \"ImageId\":face[\"ImageId\"],\n", " \"BoundingBoxes\":face[\"BoundingBox\"]\n", " }\n", " ]\n", " }\n", " solution_records.append(imagerecord)\n", " else:\n", " solution_records[position][\"Faces\"].append(\n", " {\n", " \"UserId\":userID,\n", " \"FaceId\":face[\"FaceId\"],\n", " \"ImageId\":face[\"ImageId\"],\n", " \"BoundingBoxes\":face[\"BoundingBox\"],\n", " }\n", " )\n", " return solution_records" ] }, { "cell_type": "code", "execution_count": null, "id": "efed6372-9372-49c9-9f29-bf720dfddb2d", "metadata": { "tags": [] }, "outputs": [], "source": [ "solution_records = createDataset(sorted_faces)\n", "records_filename = \"solution_records.json\"\n", "with open(records_filename, 'w') as f:\n", " json.dump(solution_records, f) \n", "f.close()" ] }, { "cell_type": "markdown", "id": "0e298a65-4b36-4e32-b1d1-8cae1e468a55", "metadata": {}, "source": [ "### 6. Send the input records to S3 " ] }, { "cell_type": "code", "execution_count": null, "id": "68c395c2-cf00-40b7-ab30-8b894deff8c1", "metadata": { "tags": [] }, "outputs": [], "source": [ "records_folder = \"records\"\n", "key = \"{}/{}\".format(records_folder, records_filename)\n", "dataset_s3_uri = sm_session.upload_data(records_filename, s3_bucket_name, records_folder)\n", "print(\"Your data records are located in {}\".format(dataset_s3_uri))" ] }, { "cell_type": "markdown", "id": "f3375ce2-0585-4744-a964-6594e2f57c9c", "metadata": {}, "source": [ "Congratulations, your data is now ready to be processed by the reindexing solution. Head over to the Reindex Solution and provide the following information:" ] }, { "cell_type": "code", "execution_count": null, "id": "2aa79def-02a1-4db5-bd6b-e60ea2f8f303", "metadata": { "tags": [] }, "outputs": [], "source": [ "print(\"old_CollectionId =\\\"{}\\\"\".format(old_CollectionId))\n", "print(\"new_CollectionId =\\\"{}\\\"\".format(new_CollectionId))\n", "print(\"s3_bucket_name =\\\"{}\\\"\".format(s3_bucket_name))\n", "print(\"records_folder =\\\"{}\\\"\".format(records_folder))\n", "print(\"records_filename =\\\"{}\\\"\".format(records_filename))" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 21, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 28, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 29, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }