# Create a 3D Point Cloud Labeling Job with Amazon SageMaker Ground Truth


This sample notebook takes you through an end-to-end workflow to demonstrate the functionality of SageMaker Ground Truth 3D point cloud built-in task types. 

### What is a Point Cloud

A point cloud frame is defined as a collection of 3D points describing a 3D scene. Each point is described using three coordinates, x, y, and z. To add color and/or variations in point intensity to the point cloud, points may have additional attributes, such as i for intensity or values for the red (r), green (g), and blue (b) color channels (8-bit). All of the positional coordinates (x, y, z) are in meters. Point clouds are most commonly created from data that was collected by scanning the real world through various scanning methods, such as laser scanning and photogrammetry. Ground Truth currently also supports sensor fusion with video camera data. 


### 3D Point Cloud Built in Task Types

You can use Ground Truth 3D point cloud labeling built-in task types to annotate 3D point cloud data. The following list briefly describes each task type. See [3D Point Cloud Task types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-task-types.html) for more information.

* 3D point cloud object detection – Use this task type when you want workers to indentify the location of and classify objects in a 3D point cloud by drawing 3D cuboids around objects. You can include one or more attributes for each class (label) you provide.


* 3D point cloud object tracking – Use this task type when you want workers to track the trajectory of an object across a sequence of 3D point cloud frames. For example, you can use this task type to ask workers to track the movement of vehicles across a sequence of point cloud frames. This task type can also be used sensor fusion, i.e. when you want workers to link 3D point cloud annotations with 2D images annotations and also link 2D image annotations among various cameras. Note that sensor fusion uses a different label category configuration file.


* 3D point cloud semantic segmentation – Use this task type when you want workers to create a point-level semantic segmentation mask by painting objects in a 3D point cloud using different colors where each color is assigned to one of the classes you specify.


You can use the Adjustment task types to verify and adjust annotations created for the task types above.

In [None]:
!pip install boto3==1.14.8
!pip install -U botocore

In [None]:
# cell 1
import boto3
import botocore
import time
import pprint
import json
import sagemaker
from sagemaker import get_execution_role
from datetime import datetime, timezone

pp = pprint.PrettyPrinter(indent=4)

sess = sagemaker.session.Session()
role = sagemaker.get_execution_role()
region = boto3.session.Session().region_name

sagemaker_client = boto3.client("sagemaker")
s3 = boto3.client("s3")
iam = boto3.client("iam")

In [None]:
# cell 2
session = sagemaker.Session()
default_bucket = session.default_bucket()
BUCKET = default_bucket
EXP_NAME = "3d-point-cloud" # Any valid S3 prefix, leave it empty unless there is a subfolder for labeling artifacts.

In [None]:
# cell 3

# Make sure the bucket is in the same region as this notebook.
bucket_region = s3.head_bucket(Bucket=BUCKET)["ResponseMetadata"]["HTTPHeaders"][
 "x-amz-bucket-region"
]
assert (
 bucket_region == region
), "Your S3 bucket {} and this notebook need to be in the same region.".format(BUCKET)

## Copy and modify files from the sample bucket

The sample files for this demo are in a public bucket to provide you with the inputs to try this demo. In order for this demo to work, we will need to copy these files from local notebook environment to the default S3 bucket so that there are in a place where you have read/write access.

In [None]:
%%bash -s "$BUCKET"

find ./sample_files/ -type f -name "*.json" -print0 | xargs -0 sed -i -e "s/\$BUCKET/$1/g"

aws s3 cp ./sample_files/ s3://$1/artifacts/gt-point-cloud-demos/ --quiet --recursive

## The Dataset and Resources

The dataset and resources used in this notebook are located in the following Amazon S3 bucket. The buckets contain: The data to be labeled, configuration files that configure label tasks, input manifest files that Ground Truth uses to read the data files, and output manifest files. The output file contains the results of the labeling job. All the datasets used here are referred from the blog https://github.com/aws/amazon-sagemaker-examples/tree/main/ground_truth_labeling_jobs/3d_point_cloud_demo

In [None]:
# cell 5
!aws s3 ls s3://$BUCKET/artifacts/gt-point-cloud-demos/

### Input Data and Input Manifest File

The following task types (and associated adjustment labeling jobs) require the following types of input manifest files. 

* 3D point cloud object detection – frame input manifest
* 3D point cloud semantic segmentation – frame input manifest
* 3D point cloud object tracking – sequence frame input manifest 
* 3D-2D point cloud object tracking – sequence frame input manifest 

In [None]:
# cell 6
## Set up manifest_s3_uri_map, to be used to set up Input ManifestS3Uri

manifest_s3_uri_map = {
 "3DPointCloudObjectDetection": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/manifests/SingleFrame-manifest.json",
 "3DPointCloudObjectTracking": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/manifests/OT-manifest-10-frame.json",
 "3DPointCloudSemanticSegmentation": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/manifests/SS-manifest.json",
 "Adjustment3DPointCloudObjectDetection": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/manifests/OD-adjustment-manifest.json",
 "Adjustment3DPointCloudObjectTracking": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/manifests/OT-adjustment-manifest.json",
 "Adjustment3DPointCloudSemanticSegmentation": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/manifests/SS-audit-manifest-5-17.json",
}

### Label Category Configuration File

Your label category configuration file is used to specify labels, or classes, for your labeling job.

When you use the object detection or object tracking task types, you can also include [label category attributes](http://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-general-information.html#sms-point-cloud-worker-task-ui) in your label category configuration file. Workers can assign one or more attributes you provide to annotations to give more information about that object. For example, you may want to use the attribute *occluded* to have workers identify when an object is partially obstructed. 

To learn more about the label category configuration file, see [Create a Label Category Configuration File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-label-category-config.html).

Run the following cell to identify the labeling category configuration file.

In [None]:
# cell 7
label_category_file_s3_uri_map = {
 "3DPointCloudObjectDetection": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/label-category.json",
 "3DPointCloudObjectTracking": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/label-category.json",
 "3DPointCloudSemanticSegmentation": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/label-category.json",
 "Adjustment3DPointCloudObjectDetection": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/od-adjustment-label-categories-file.json",
 "Adjustment3DPointCloudObjectTracking": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/ot-adjustment-label-categories-file.json",
 "Adjustment3DPointCloudSemanticSegmentation": f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/SS-audit-5-17-updated-manually-created-label-categories-file.json",
}

In [None]:
# cell 8
# You can use this to identify your labeling job by appending these abbreviations to your lableing job name.
name_abbreviation_map = {
 "3DPointCloudObjectDetection": "OD",
 "3DPointCloudObjectTracking": "OT",
 "3DPointCloudSemanticSegmentation": "SS",
 "Adjustment3DPointCloudObjectDetection": "OD-ADJ",
 "Adjustment3DPointCloudObjectTracking": "OT-ADJ",
 "Adjustment3DPointCloudSemanticSegmentation": "SS-ADJ",
}

### Identify Resources for Labeling Job

The following will be used to select the HumanTaskUiArn. When you create a 3D point cloud labeling job, Ground Truth provides the worker task UI. The following cell identifies the correct HumanTaskUiArn to use a worker UI that is specific to your task type. You can see examples of the worker UIs on the [3D Point Cloud Task Type](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-task-types.html) pages. 

In [None]:
# cell 9
## Set up human_task_ui_arn map

human_task_ui_arn_map = {
 "3DPointCloudObjectDetection": f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/PointCloudObjectDetection",
 "3DPointCloudObjectTracking": f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/PointCloudObjectTracking",
 "3DPointCloudSemanticSegmentation": f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/PointCloudSemanticSegmentation",
 "Adjustment3DPointCloudObjectDetection": f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/PointCloudObjectDetection",
 "Adjustment3DPointCloudObjectTracking": f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/PointCloudObjectTracking",
 "Adjustment3DPointCloudSemanticSegmentation": f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/PointCloudSemanticSegmentation",
}

ac_arn_map = {
 "us-west-2": "081040173940",
 "us-east-1": "432418664414",
 "us-east-2": "266458841044",
 "eu-west-1": "568282634449",
 "ap-northeast-1": "477331159723",
}

## Select a 3D Point Cloud Labeling Job Task Type

In the following cell, select a [3D Point Cloud Task Type](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-task-types.html) by sepcifying a value for `task_type`. The supported task types include: "3DPointCloudObjectDetection", "3DPointCloudObjectTracking", "3DPointCloudSemanticSegmentation", "Adjustment3DPointCloudObjectDetection", "Adjustment3DPointCloudObjectTracking", "Adjustment3DPointCloudSemanticSegmentation"

### 3D Point Cloud Object Detection

In [None]:
# cell 10
task_type = "3DPointCloudObjectDetection"

For this task type, you will use a **manifest with single-frame per task**. To learn more about the types of 3D Point Cloud input manfiest files, see [3D Point Cloud Input Data](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-input-data.html).

#### Input Manifest File With Single Frame Per Task

When you use a frame input manifest for 3D point cloud object detection and semantic segmentation task types, each line in the input manifest will identify the location of a single point cloud file in Amazon S3. When a task is created, workers will be asked to classify or add a segmentation mask to objects in that frame (depending on the task type). 

Let's look at the single-frame input manfiest. You'll see that this manifest file contains the location of a point cloud file in `source-ref`, as well as the pose of the vehicle used to collect the data (ego-vehicle), image pose information and other image data used for sensor fusion. See [Create a Point Cloud Frame Input Manifest File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-single-frame-input-data.html) to learn more about these parameters. 

In [None]:
# cell 11
print("\nThe single-frame input manifest file:")
with open("./sample_files/manifests/SingleFrame-manifest.json", "r") as j:
 json_data = json.load(j)
 print("\n", json.dumps(json_data, indent=4, sort_keys=True))

The point cloud data in the file, `0.txt`, identified in the manfiest above is in ASCII format. Each line in the point cloud file contains information about a single point. The first three values are x, y, and z location coordinates, and the last element is the pixel intensity. To learn more about this raw data format, see [ASCII Format](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-raw-data-types.html#sms-point-cloud-raw-data-ascii-format).

In [None]:
# cell 12
frame = open("./sample_files/frames/0.txt")
print("\nA single line from the point cloud file with x, y, z and pixel intensity values: \n")
frame.readline()

#### Set up Human Task Configuration

`HumanTaskConfig` is used to specify your work team, and configure your labeling job tasks. 

If you want to preview the worker task UI, [create a private work team](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-console.html) and add yourself as a worker. 

If you have already created a private workforce, follow the instructions in [Add or Remove Workers](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management-private-console.html#add-remove-workers-sm) to add yourself to the work team you use to create a lableing job. 

To find workteam_arn, go to SageMaker console -> Ground Truth -> Labeling workforces -> Private -> copy ARN of the correp

In [None]:
# cell 13
## Set up Human Task Config
workteam_arn = "Your-WorkTeam_ARN"
## Modify the following
task_description = "Object Detection in 3D point cloud"
# example keywords
task_keywords = ["lidar", "pointcloud"]
# add a task title
task_title = "Bounding Cars"
# add a job name to identify your labeling job
job_name = "SF-area1-car-detection-0505-fix"

prehuman_arn = "arn:aws:lambda:{}:{}:function:PRE-{}".format(region, ac_arn_map[region], task_type)
acs_arn = "arn:aws:lambda:{}:{}:function:ACS-{}".format(region, ac_arn_map[region], task_type)

human_task_config = {
 "AnnotationConsolidationConfig": {
 "AnnotationConsolidationLambdaArn": acs_arn,
 },
 "WorkteamArn": workteam_arn,
 "PreHumanTaskLambdaArn": prehuman_arn,
 "MaxConcurrentTaskCount": 200, # 200 data objects (frames for OD and SS or sequences for OT) will be sent at a time to the workteam.
 "NumberOfHumanWorkersPerDataObject": 1, # One worker will work on each task
 "TaskAvailabilityLifetimeInSeconds": 18000, # Your workteam has 5 hours to complete all pending tasks.
 "TaskDescription": task_description,
 "TaskKeywords": task_keywords,
 "TaskTimeLimitInSeconds": 3600, # Each seq/frame must be labeled within 1 hour.
 "TaskTitle": task_title,
}


human_task_config["UiConfig"] = {"HumanTaskUiArn": "{}".format(human_task_ui_arn_map[task_type])}
# print(json.dumps(human_task_config, indent=4, sort_keys=True))

#### Set up Create Labeling Request

The following formats your labeling job request. For 3D point cloud object tracking, 2D-3D point cloud object tracking, and semantic segmentation task types, the `LabelAttributeName` must end in `-ref`. For other task types, the label attribute name may not end in `-ref`. 

In [None]:
# cell 14
## Set up Create Labeling Request

labelAttributeName = job_name + "-ref"

if (
 task_type == "3DPointCloudObjectDetection"
 or task_type == "Adjustment3DPointCloudObjectDetection"
):
 labelAttributeName = job_name


ground_truth_request = {
 "InputConfig": {
 "DataSource": {
 "S3DataSource": {
 "ManifestS3Uri": "{}".format(manifest_s3_uri_map[task_type]),
 }
 },
 "DataAttributes": {
 "ContentClassifiers": ["FreeOfPersonallyIdentifiableInformation", "FreeOfAdultContent"]
 },
 },
 "OutputConfig": {
 "S3OutputPath": f"s3://{BUCKET}/{EXP_NAME}/output/",
 },
 "HumanTaskConfig": human_task_config,
 "LabelingJobName": job_name,
 "RoleArn": role,
 "LabelAttributeName": labelAttributeName,
 "LabelCategoryConfigS3Uri": label_category_file_s3_uri_map[task_type],
}

# print(json.dumps(ground_truth_request, indent=4, sort_keys=True))

#### Call CreateLabelingJob to Create 3D Point Cloud Object Detection Job

In [None]:
# cell 15
sagemaker_client.create_labeling_job(**ground_truth_request)
print(f"Labeling Job Name: {job_name}")

In [None]:
describeLabelingJob = sagemaker_client.describe_labeling_job(LabelingJobName=job_name)
print(describeLabelingJob)

### 3D(-2D) Point Cloud Object Tracking

In [None]:
# cell 16
task_type = "3DPointCloudObjectTracking"

#### Input Manifest File With Multi-Frame Sequence Per Task

When you chooose a sequence input manifest file, each line in the input manifest will point to a *sequence file* in Amazon S3. A sequence specifies a temporal series of point cloud frames. When a task is created using a sequence file, all point cloud frames in the sequence are sent to a worker to label. Workers can navigate back and forth between and annotate (with 3D cuboids) the sequence of frames to track the trajectory of objects across frames. 

Let's look at the sequence input manifest file. You'll see that this input manifest contains the location of a single sequence file. 

In [None]:
# cell 17
print("\nThe multi-frame input manifest file:")
with open("./sample_files/manifests/OT-manifest-10-frame.json", "r") as j:
 json_data = json.load(j)
 print("\n", json.dumps(json_data, indent=4, sort_keys=True))

Let's look at the sequence file, seq1.json. You will see that this single sequence file contains the location of 10 frames, as well as pose information on the vehicle (ego-vehicle) and camera. See [Create a Point Cloud Frame Sequence Input Manifest](http://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-multi-frame-input-data.html) to learn more about these parameters.

In [None]:
# cell 18
with open("./sample_files/sequences/seq1.json", "r") as j:
 json_data = json.load(j)
 print("\nA single sequence file: \n\n", json.dumps(json_data, indent=4, sort_keys=True))

#### Set up Human Task Configuration

`HumanTaskConfig` is used to specify your work team, and configure your labeling job tasks. 

If you want to preview the worker task UI, create a private work team and add yourself as a worker. 

If you have already created a private workforce, follow the instructions in [Add or Remove Workers](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management-private-console.html#add-remove-workers-sm) to add yourself to the work team you use to create a lableing job. 

In [None]:
# cell 19
## Set up Human Task Config

## Modify the following
task_description = "Object Tracking in 3D point cloud"
# example keywords
task_keywords = ["lidar", "pointcloud"]
# add a task title
task_title = "Tracking Cars"
# add a job name to identify your labeling job
job_name = ""

prehuman_arn = "arn:aws:lambda:{}:{}:function:PRE-{}".format(region, ac_arn_map[region], task_type)
acs_arn = "arn:aws:lambda:{}:{}:function:ACS-{}".format(region, ac_arn_map[region], task_type)

human_task_config = {
 "AnnotationConsolidationConfig": {
 "AnnotationConsolidationLambdaArn": acs_arn,
 },
 "WorkteamArn": workteam_arn,
 "PreHumanTaskLambdaArn": prehuman_arn,
 "MaxConcurrentTaskCount": 200, # 200 data objects (frames for OD and SS or sequences for OT) will be sent at a time to the workteam.
 "NumberOfHumanWorkersPerDataObject": 1, # One worker will work on each task
 "TaskAvailabilityLifetimeInSeconds": 18000, # Your workteam has 5 hours to complete all pending tasks.
 "TaskDescription": task_description,
 "TaskKeywords": task_keywords,
 "TaskTimeLimitInSeconds": 3600, # Each seq/frame must be labeled within 1 hour.
 "TaskTitle": task_title,
}


human_task_config["UiConfig"] = {"HumanTaskUiArn": "{}".format(human_task_ui_arn_map[task_type])}

#### Set up Create Labeling Request

The following formats your labeling job request. For 3D point cloud object tracking, 2D-3D point cloud object tracking, and semantic segmentation task types, the `LabelAttributeName` must end in `-ref`. For other task types, the label attribute name may not end in `-ref`. 

In [None]:
# cell 20
## Set up Create Labeling Request

labelAttributeName = job_name + "-ref"

if (
 task_type == "3DPointCloudObjectDetection"
 or task_type == "Adjustment3DPointCloudObjectDetection"
):
 labelAttributeName = job_name


ground_truth_request = {
 "InputConfig": {
 "DataSource": {
 "S3DataSource": {
 "ManifestS3Uri": "{}".format(manifest_s3_uri_map[task_type]),
 }
 },
 "DataAttributes": {
 "ContentClassifiers": ["FreeOfPersonallyIdentifiableInformation", "FreeOfAdultContent"]
 },
 },
 "OutputConfig": {
 "S3OutputPath": f"s3://{BUCKET}/{EXP_NAME}/output/",
 },
 "HumanTaskConfig": human_task_config,
 "LabelingJobName": job_name,
 "RoleArn": role,
 "LabelAttributeName": labelAttributeName,
 # Note that sensor fusion job uses a different label category configuration file
 # IF it's a 3D object tracking task, keep as it is
 # IF it's a sensor fusion task, replace label_category_file_s3_uri_map[task_type] 
 # with
 # f"s3://{BUCKET}/artifacts/gt-point-cloud-demos/label-category-config/linking-lcc.json"
 "LabelCategoryConfigS3Uri": label_category_file_s3_uri_map[task_type],
}

print(json.dumps(ground_truth_request, indent=4, sort_keys=True))

#### Call CreateLabelingJob to Create 3D Point Cloud Object Detection Job

In [None]:
# cell 21
sagemaker_client.create_labeling_job(**ground_truth_request)
print(f"Labeling Job Name: {job_name}")

## Check Status of Labeling Job

In [None]:
# cell 22
## call describeLabelingJob
describeLabelingJob = sagemaker_client.describe_labeling_job(LabelingJobName=job_name)
print(describeLabelingJob)

### Start Working on tasks

When you add yourself to a private work team, you recieve an email invitation to access the worker portal. Use this invitation to sign in to the portal and view your 3D point cloud annotation tasks. Tasks may take up to 10 minutes to show up the worker portal. 

Once you are done working on the tasks, click **Submit**. 

### View Output Data

Once you have completed all of the tasks, you can view your output data in the S3 location you specified in `OutputConfig`. 

To read more about Ground Truth output data format for your task type, see [Output Data](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html).