# 0. ReIndex Solution Data Preparation

This notebook contains scripts to help you create you face collection data in the expected format for the reindexing solution.

**Notebook Steps:**
1. Import libraries and clients
2. Define variables
3. Create a new collection and S3 bucket
4. Sort old collection faces
5. Modify getAdditionalInfo() to retrieve the original face images and userId
6. Send records to Amazon S3

Data records must follow this structure for the solution to work:
```
[ 
 {
 "Bucket": String,
 "Key": String,
 "ExternalImageId": String,
 "CollectionId": String,
 "Faces": [
 {
 "UserId": String, #Optional
 "FaceId": String,
 "ImageId": String,
 "BoundingBoxes": {
 "Width": Float,
 "Height": Float,
 "Left": Float,
 "Top": Float
 }
 }
 ]
 }
] 
```


### 1. Import libraries and clients

In [None]:
import boto3, json, os, logging, sagemaker, time
from botocore.exceptions import ClientError
s3Resource = boto3.resource('s3')
rekClient = boto3.client('rekognition')
sm_session = sagemaker.Session()
boto3_session = boto3.session.Session()
boto3_region = boto3_session.region_name

### 2. Define variables

In [None]:
s3_bucket_name = "" # Specify a unique name for a new S3 bucket
old_CollectionId = "" # Specify the name of your old collection
new_CollectionId = "" # Specify a unique ID for your new collection

### 3. Create a new collection and S3 bucket

#### Helper functions

In [None]:
def createCollection(collectionid):
 response = rekClient.create_collection(
 CollectionId=collectionid
 )
 print(response)
 
def list_collection_faces(collection_id):
 response = rekClient.list_faces(
 CollectionId=collection_id
 )
 print("Number Faces in {} is : {} ".format(collection_id, len(response["Faces"]))) 

def createS3bucket(s3_bucket_name):
 if (boto3_region == "us-east-1"):
 s3Resource.create_bucket(Bucket=s3_bucket_name)
 else:
 s3Resource.create_bucket(Bucket=s3_bucket_name, CreateBucketConfiguration={'LocationConstraint': boto3_region})

In [None]:
createS3bucket(s3_bucket_name) # Create a new bucket

In [None]:
createCollection(new_CollectionId) # Create the new collection

In [None]:
list_collection_faces(old_CollectionId)
list_collection_faces(new_CollectionId)

### 4. Sort old collection faces

In [None]:
old_collection_faces = rekClient.list_faces(
 CollectionId=old_CollectionId
)

sorted_faces = sorted(old_collection_faces["Faces"], key=lambda d: d['ImageId']) 
print("Old Faces:",len(sorted_faces))

You will need to implement the code for the getAdditionalInfo function to retrieve the Bucket and Key of the image used for the face index. If you also have mapped an internal userId you can also include it optionally.

### 5. Modify getAdditionalInfo() to retrieve the original face images and userId

You will have to provide the bucket and key of the images that were used to index the original face collection. This data is needed to reindex the collection. 

Modify getAdditionalInfo() to return the following metadata for a face to associated with faces indexed:

- image_storage_bucket - bucket where face images are stored
- image_key - the object / file name of the image in the image_storage_bucket
- userID (optional - if you have a defined ID for each user)

In [None]:
def getAdditionalInfo(faceid):
 #Retrieve userID,image_storage_bucket,image_key from internal mapping
 # Feel free to modify the function and expect other inputs such as ImageId or ExternalImageId if useful.
 return userID,image_storage_bucket,image_key

In [None]:
def createDataset(sorted_faces):
 solution_records = []
 face = sorted_faces[0]
 position = 0
 previousImageId = face["ImageId"]
 userID,image_storage_bucket,image_key = getAdditionalInfo(face["FaceId"])
 #userID,image_storage_bucket,image_key = "","photos-bucket-name",face["ExternalImageId"] #Delete this if you can call getAdditionalInfo

 imagerecord = {
 "Bucket":image_storage_bucket,
 "Key":image_key,
 "ExternalImageId":face["ExternalImageId"],
 "CollectionId":new_CollectionId, 
 "Faces":[
 {
 "UserId":userID,
 "FaceId":face["FaceId"],
 "ImageId":face["ImageId"],
 "BoundingBoxes":face["BoundingBox"]
 }
 ]
 }
 
 solution_records.append(imagerecord)

 for face in sorted_faces[1:]: 
 if face["ImageId"] != previousImageId:
 previousImageId = face["ImageId"]
 userID,image_storage_bucket,image_key = getAdditionalInfo(face["FaceId"])
 #userID,image_storage_bucket,image_key = "","photos-bucket-name",face["ExternalImageId"] #Delete this if you can call getAdditionalInfo
 position = position + 1
 imagerecord = {
 "Bucket":image_storage_bucket,
 "Key":image_key,
 "ExternalImageId":face["ExternalImageId"],
 "CollectionId":new_CollectionId, 
 "Faces":[
 {
 "UserId":userID,
 "FaceId":face["FaceId"],
 "ImageId":face["ImageId"],
 "BoundingBoxes":face["BoundingBox"]
 }
 ]
 }
 solution_records.append(imagerecord)
 else:
 solution_records[position]["Faces"].append(
 {
 "UserId":userID,
 "FaceId":face["FaceId"],
 "ImageId":face["ImageId"],
 "BoundingBoxes":face["BoundingBox"],
 }
 )
 return solution_records

In [None]:
solution_records = createDataset(sorted_faces)
records_filename = "solution_records.json"
with open(records_filename, 'w') as f:
 json.dump(solution_records, f) 
f.close()

### 6. Send the input records to S3 

In [None]:
records_folder = "records"
key = "{}/{}".format(records_folder, records_filename)
dataset_s3_uri = sm_session.upload_data(records_filename, s3_bucket_name, records_folder)
print("Your data records are located in {}".format(dataset_s3_uri))

Congratulations, your data is now ready to be processed by the reindexing solution. Head over to the Reindex Solution and provide the following information:

In [None]:
print("old_CollectionId =\"{}\"".format(old_CollectionId))
print("new_CollectionId =\"{}\"".format(new_CollectionId))
print("s3_bucket_name =\"{}\"".format(s3_bucket_name))
print("records_folder =\"{}\"".format(records_folder))
print("records_filename =\"{}\"".format(records_filename))