# Lidar 3D point cloud labeling with Velodyne Lidar sensor in SageMaker GroundTruth

This notebook will demonstrate how to pre-process LiDAR sensor data to create an object tracking labeling job with Sensor fusion in SageMaker Ground Truth.

For object tracking, you will track the movement of an object (e.g., car, pedestrian) while your point of reference (in this case, the car) is moving. You will experiment with coverting your 3D point cloud data from local coordinates to the world coordinate system to keep everything in the frame of reference.

We will also include camera image leveraging the sensor fusion feature in SageMaker Ground Turth to provide labeling workers more visual information about the scene they are labeling. Through sensor fusion, workers will be able to adjust labels in the 3D scene and in 2D images, and label adjustments will be mirrored in the other view.

The dataset used is provided to us by Velodyne. We will go over the dataset content in detail in later sections.

## Prerequisites

- An S3 bucket you can write to. The bucket must be in the same region as this SageMaker Notebook instance. You can also define a valid S3 prefix. All the files related to this experiment will be stored in that prefix of your bucket. ***Important: you must attach the CORS policy to this bucket.** To learn how to add a CORS policy to your S3 bucket, follow the instructions in [How do I add cross-domain resource sharing with CORS?](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html). Paste the following policy in the CORS configuration editor:

```
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>HEAD</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>
    <ExposeHeader>Access-Control-Allow-Origin</ExposeHeader>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
</CORSRule>
</CORSConfiguration>
```

- Download pcd.py (run the code block below): pyntcloud is a Python 3 library for working with 3D point clouds. This module allows us to work with the .pcd files generated from the LiDAR sensors
- Familiarity with the [Ground Truth 3D Point Cloud Labeling Job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud.html).
- Familiarity with Python and numpy.
- Basic understanding of [AWS Sagemaker](https://aws.amazon.com/sagemaker/).
- Basic familiarity with [AWS Command Line Interface (CLI)](https://aws.amazon.com/cli/)

This notebook has only been tested on a SageMaker notebook instance. We used an ml.t3.medium instance in our tests.

In [None]:
# update awscli in case the version installed is out of date
!pip install awscli --upgrade

## Import modules and initialize parameters for this notebook

In [None]:
import pcd
import json
import yaml
import boto3
import numpy as np
import time
import sagemaker
from urllib.parse import urlparse
from scipy.spatial.transform import Rotation
from IPython.display import clear_output, display

sagemaker_client = boto3.client('sagemaker')

In [None]:
BUCKET = 'velodyne-blog' #<Your Bucket Name>
PREFIX = 'lidar_point_cloud_data' #<Any Valid S3 Prefix>

In [None]:
# Please make sure your bucket is in the same region as this notebook.
role = sagemaker.get_execution_role()
region = boto3.session.Session().region_name


s3_client = boto3.client('s3')
s3 = boto3.resource('s3')

bucket_region = s3_client.head_bucket(Bucket=BUCKET)["ResponseMetadata"]["HTTPHeaders"][
    "x-amz-bucket-region"
]
assert (
    bucket_region == region
), "Your S3 bucket {} and this notebook need to be in the same region.".format(BUCKET)

## Data loading functions
Defining some functions to help us process data (download, parse, upload, etc.) effeciently from/to s3

In [None]:
def load_s3_as_stream(s3_uri):
    s3_info = urlparse(s3_uri, allow_fragments=False)
    response = s3_client.get_object(Bucket=s3_info.netloc, Key=s3_info.path[1:])
    file_stream = response['Body']
    return file_stream

def load_s3_as_buffer(s3_uri):
    s3_info = urlparse(s3_uri, allow_fragments=False)
    content_object = s3.Object(s3_info.netloc, s3_info.path[1:])
    file_fuffer = content_object.get()['Body'].read()
    return file_fuffer


def load_s3_as_string(s3_uri):
    file_content = load_s3_as_buffer(s3_uri).decode('utf-8')
    return file_content


def json_print(json_obj):
    print(json.dumps(json_obj, sort_keys=True, default=str, indent=2))


def load_json_from_s3(s3_uri):
    s3_content = load_s3_as_string(s3_uri)
    json_content = json.loads(s3_content)
    return json_content

# Doesn't work with OpenCV generated yaml file    
def load_yaml_from_s3(s3_uri):
    stream = load_s3_as_stream(s3_uri)
    return yaml.safe_load(stream)

def write_json_to_s3(content, bucket, key):
    s3_object = s3.Object(bucket, key)
    str_content = json.dumps(content, sort_keys=True, default=str, indent=2)
    s3_object.put(Body=str_content.encode('utf-8'))


def write_txt_to_s3(txt, bucket, key):
    s3_object = s3.Object(bucket, key)
    s3_object.put(Body=txt.encode('utf-8'))
    return f"s3://{bucket}/{key}"
    
def list_s3_objects(bucket, prefix, surfix):
    list_response = s3_client.list_objects(Bucket=bucket, Prefix=prefix)

    if list_response["IsTruncated"]:
        print("There are more then 1000 files, we need to handle pagenation.")
        assert False

    surfix_len = len(surfix)
    all_key = [content["Key"] for content in list_response["Contents"] if surfix == content["Key"].split("/")[-1][-surfix_len:]]
    all_key.sort()

    return all_key

def write_manifest_to_s3(manifest_lines, bucket, key):
    s3_object = s3.Object(bucket, key)
    str_content = ""
    for line_json in manifest_lines:
        str_content += (json.dumps(line_json, separators=(',', ':'))+"\n")
    s3_object.put(Body=str_content.encode('utf-8'))

## Describe the LiDAR dataset

Velodyne provided the LiDAR sensor dataset for this example. This data is under [Creative Commons Attribution-NonCommercial-ShareAlike 3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/) license, and it is hosted at https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-8424/lidar_point_cloud_data.zip

This dataset contains one continous scene from a **Autonomous vehicle experiment driving around on a highway between Oakland and San Francisco?***. The entire scene contains 60 frames. The dataset also include camera images, sensor pose, etc. in addition to the point cloud data (.pcd).  here are the data content.

- lidar_cam_calib_vlp32_06_10_2021.yaml (camera calibration info, 1 camera only)
- images/ (camera footage for each frame)
- poses/ (pose json file containing LiDAR extrinsic matrix for each frame) 
- rectified_scans_local/ (.pcb files in LiDAR sensor local coordinate system)

Run the section below to download the dataset locally and then upload to your S3 bucket which you have defined in the sections above.

If you are interested, you can also ran the optional section to see the experiment details from the sensor setting files.

In [None]:
!wget https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-8424/lidar_point_cloud_data.zip
!unzip lidar_point_cloud_data.zip

target_s3 = f's3://{BUCKET}/{PREFIX}'
!aws s3 cp ./lidar_point_cloud_data $target_s3 --recursive

!rm lidar_point_cloud_data.zip

In [None]:
all_pcd_keys = list_s3_objects(BUCKET, f"{PREFIX}/rectified_scans_local", "pcd")
print(f'Total number of frames in this scene is {len(all_pcd_keys)} =================\n')

## Preview the Data
Each `.pcb` file is one frame in the entire scene.  We will use the pcb.py module from pyntcloud to load the frame data. We will also take a quick peek of the data structure, this will be transformed into the Ground Truth supported data format in the next section.

In [None]:
for pcd_key in all_pcd_keys[0:1]:
    print(pcd_key)
    point_data = pcd.read_pcd(pcd_key)
    
point_data['points'].head()

## Create a Labeling Job

Object tracking is our Task Type. read more here on other [3D Point Cloud Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-task-types.html).

To create an Object Tracking Point Cloud Labeling Job, you need to feed the following resources as the labeling job inputs:

1) **Create Point Cloud Sequence Input Manifest:** This is a json file defining the point cloud frame sequence and associated sensor fusion data. For more information, see [Create a Point Cloud Sequence Input Manifest](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-multi-frame-input-data.html).

2) **Create a input manifest file:** This is the input file for the labeling job. Each line of the manifest fine contains a link to a sequence file define in step 1.

3) **Create a Label Category Configuration file:** This file is used to specify your labels, label category, frame atrributes, and worker instructions. For more information, see [Create a Labeling Category Configuration File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html).

4) **Provide Pre-defined AWS Resources**
  - **Pre-annotation Lambda ARN:** Please refer to [PreHumanTaskLambdaArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-PreHumanTaskLambdaArn). 
  
  - **Annotation Consolidation ARN** This lambda function is used to consolidate labels from different workers. Please refer to [AnnotationConsolidationLambdaArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html#sagemaker-Type-AnnotationConsolidationConfig-AnnotationConsolidationLambdaArn).
  
  - **Workforce ARN:** define which workforce type you would like to use. Please refer to [Create and Manage Workforces](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html) for more details.
  
  - **HumanTaskUiArn:** You must enter a HumanTaskUiArn to define the worker UI template to do the labeling job. This should like something like this `arn:aws:sagemaker:<region>:394669845002:human-task-ui/PointCloudObjectTracking`. Replace <region> your region info.


**[Be AWARE]**

- There should not be an entry for the UiTemplateS3Uri parameter.

- Your LabelAttributeName must end in -ref. For example, ot-labels-ref.

- The number of workers specified in NumberOfHumanWorkersPerDataObject should be 1.

- <span style="color:red">**3D point cloud labeling does not support active learning**</span>, so do not specify values for parameters in LabelingJobAlgorithmsConfig.

- Be aware, 3D point cloud object tracking labeling jobs can take multiple hours to complete. You should specify a longer time limit for these labeling jobs in TaskTimeLimitInSeconds (up to 7 days, or 604,800 seconds).

In [None]:
#object tracking as our 3D Point Cloud Task Type. 
task_type = "3DPointCloudObjectTracking"

## Point Cloud Sequence Input Manifest File

Two of the most important steps to generating a sequence input manifest file is to 1) convert the 3D points to **world coordinate system** and 2) generate the sensor extrinsic matrix to enable **Sensor fusion** feature in SageMaker GroundTruth.  

**World Coordinate System** The LiDAR sensor is mounted on a moving vehicle (ego vehicle) which captures the data in its own frame of reference.  In order to perform obnject tracking, we need to convert this data to a global frame of reference to account for the moving ege vehicle itself.

**Sensor Fusion** is a feature in SaGeMaker Ground Truth that synchronizes the 3D point cloud frame side-b-side with the video freame.  This provide visual context for human labelers and also allow labelers to adjust annoation in 3D and 2D images synchronously. 

For the step-by-step instruction on how the matrix transformation is done, please check out this [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/labeling-data-for-3d-object-tracking-and-sensor-fusion-in-amazon-sagemaker-ground-truth/)

**`generate_transformed_pcd_from_point_cloud`** function performance the coordinate translation and then generate the 3D point data file which Ground Truth can consume.

**Coordinate Translation**: To Translate the data from local/sensor global coordinate system, multiplying each point in a 3D frame with the extrinsic matrix for the LiDAR sensor.

**Raw 3D Data File format**: Ground Truthes renders the 3D point cloud data in either Compact Binary Pack (.bin) or ASCII (.txt) format.  File in these format need to contain information about the location (x, y, and z coordinates) of all points that make up that frame, and, optionally, information about the pixel color of each point for colored point clouds (i, r, g, b).

To read more about Ground Truth accepted raw 3d data formats, see [Raw 3D Data Formats](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-point-cloud-raw-data-types.html)

In [None]:
# This function generate the transformed point cloud file for GT.  
def generate_transformed_pcd_from_point_cloud(points, lidar_extrinsic_matrix, tranform = False):
    tps = []
    lidar_txt = ""

    for point in points:
        
        if tranform: #transform from local to global coordinate system and then generate the txt file
            transformed_points = np.matmul(lidar_extrinsic_matrix, np.array([point[0], point[1], point[2], 1], dtype=np.float32).reshape(4,1)).tolist()
            if len(point) > 3 and point[3] is not None:
                tps.append([transformed_points[0][0], transformed_points[1][0], transformed_points[2][0], point[3]])
                pctxt = f"{transformed_points[0][0]} {transformed_points[1][0]} {transformed_points[2][0]} {point[3]}"
                lidar_txt = lidar_txt + pctxt + '\n'
        else: # else generate txt file only [Test Purpose]
            if len(point) > 3 and point[3] is not None:
                pctxt = f"{point[0]} {point[1]} {point[2]} {point[3]}"
                lidar_txt = lidar_txt + pctxt + '\n'
                
    return lidar_txt

The functions below extracts pose data from camera extrinsic matrices to populate the 3D point cloud sequence input manifest

In [None]:
from scipy.spatial.transform import Rotation as R

# utility to convert extrinsic matrix to pose heading quaternion and position
def convert_extrinsic_matrix_to_trans_quaternion_mat(lidar_extrinsic_transform):
    position = lidar_extrinsic_transform[0:3, 3]
    rot = np.linalg.inv(lidar_extrinsic_transform[0:3, 0:3])
    quaternion= R.from_matrix(np.asarray(rot)).as_quat()
    trans_quaternions = {
        "translation": {
            "x": position[0],
            "y": position[1],
            "z": position[2]
        },
        "rotation": {
            "qx": quaternion[0],
            "qy": quaternion[1],
            "qz": quaternion[2],
            "qw": quaternion[3]
            }
    }
    return trans_quaternions

In [None]:
def convert_camera_inv_extrinsic_matrix_to_trans_quaternion_mat(camera_extrinsic_transform):
    position = camera_extrinsic_transform[0:3, 3]
    rot = np.linalg.inv(camera_extrinsic_transform[0:3, 0:3])
    quaternion= R.from_matrix(np.asarray(rot)).as_quat()
    trans_quaternions = {
        "translation": {
            "x": position[0],
            "y": position[1],
            "z": position[2]
        },
        "rotation": {
            "qx": quaternion[0],
            "qy": quaternion[1],
            "qz": quaternion[2],
            "qw": -quaternion[3]
            }
    }
    return trans_quaternions

### Building the Sequence Input Manifest File

The code below performs the following steps to build the Point Cloud Sequence Inpute Manifest File

1) Load data 
    - Point cloud data from .pcd file
    - LiDAR extrinsic Matrix from the pose file
    - Camera extrinsic, intrinsic, and distortion data from the camera calibration yaml file

2) Per frame, Transform the raw point cloud to the global frame of reference. Generate and store ASCII (.txt) for each frame to S3 

3) Extract ego vehicle pose from LiDAR extrinsic Matrix

4) build sensor poistion in global coordinate system by extracting camera pose from camera inverse Extrinsic

5) provide camera calibration parameters (distortion, skew, etc.)

6) build the array of data frames: ref the ASCII file location, define the vehicle position in world coordinate system, etc.

7) create the sequence manifest file: sequence.json

8) create our input manifest file.  each line identifies a singel sequence file we just uploaded

In [None]:
import pprint as pp

# recalibrate the seq file
calib_yaml = load_yaml_from_s3(f's3://{BUCKET}/{PREFIX}/lidar_cam_calib_vlp32_06_10_2021.yaml')

print(calib_yaml["cam0"]["ext_R"])

cam_r = Rotation.from_euler('zyx', calib_yaml["cam0"]["ext_R"], degrees=True)
camera_extrinsic_calibrations = np.append(cam_r.as_matrix(), np.asarray(calib_yaml["cam0"]["ext_t"]).reshape(3,1), 1)
camera_extrinsic_calibrations = np.append(camera_extrinsic_calibrations, np.asarray([0, 0, 0, 1]).reshape(1,4), 0)

camera_intrinsics = calib_yaml["cam0"]["intrinsics"]
camera_distortion = calib_yaml["cam0"]["distortion_coeffs"]

# print(camera_extrinsic_calibrations)
# get file name of all the frames
frame_s3_keys = [content.split("/")[-1].split(".")[0] for content in all_pcd_keys 
               if content.split("/")[-1].endswith(".pcd")]
frame_s3_keys.sort()

seq_json = {}
seq_json["seq-no"] = 1
seq_json["prefix"] = f"s3://{BUCKET}/{PREFIX}/"
seq_json["number-of-frames"] = len(frame_s3_keys)
seq_json["frames"] = []

for idx, frame_key in enumerate(all_pcd_keys):
    
    if frame_key.split("/")[-1].endswith(".pcd"):
        frame_id = frame_key.split("/")[-1].split(".")[0]
                
        # load the data points from pcb file
        points = pcd.read_pcd(frame_key)["points"].to_numpy(dtype=np.dtype(object))[:,[2,3,4,5]]

        # Each pose file is the pose of the LiDAR sensors. This is the LiDAR Extrinsic Matrix
        # used to rotate sensor data from local to global.
        lidar_pose = load_json_from_s3(f"s3://{BUCKET}/{PREFIX}/poses/{frame_id}.json")   
        # next 2 steps build generates the lidar_extrinsic_matrix
        lidar_extrinsic_matrix = np.append(lidar_pose["rotation"], np.asarray(lidar_pose["translation"]).reshape(3,1), 1)
        lidar_extrinsic_matrix = np.append(lidar_extrinsic_matrix, np.asarray([0, 0, 0, 1]).reshape(1,4), 0)


        trans_quaternions = convert_extrinsic_matrix_to_trans_quaternion_mat(lidar_extrinsic_matrix)
        
        # customer transforms points from lidar to global frame using lidar_extrinsic_matrix
        transformed_pcl_txt = generate_transformed_pcd_from_point_cloud(points, lidar_extrinsic_matrix, tranform=True)
        
        lidar_txt_key = frame_key.replace(".pcd", ".txt").replace("rectified_scans_local", "rectified_txt_global")

        txt_s3_path = write_txt_to_s3(transformed_pcl_txt, BUCKET, lidar_txt_key)

        ego_vehicle_pose = {}
        ego_vehicle_pose['heading'] = trans_quaternions['rotation']
        ego_vehicle_pose['position'] = trans_quaternions['translation']

        frame = dict()
        frame["frame-no"] = idx
        frame["frame"] = f"rectified_txt_global/{frame_id}.txt"
        frame["format"] = "text/xyzi"
        frame["unix-timestamp"] = int(frame_id.split("_")[-1])/1000000000
        frame["ego-vehicle-pose"] = ego_vehicle_pose

        images = []

        # There is only one camera, so only one image_json per frame
        image_json = dict()

        # Camera Extrinsic Calibration comes from ext_R (rotation) and ext_t (translation) of the 
        # lidar cam calib yaml file
        camera_transform= np.linalg.inv(np.matmul(camera_extrinsic_calibrations, np.linalg.inv(lidar_extrinsic_matrix)))
        

        cam_trans_quaternions = convert_camera_inv_extrinsic_matrix_to_trans_quaternion_mat(camera_transform)
    
        image_json["image-path"] = f"images/{frame_id}.jpeg"
        image_json["unix-timestamp"] = int(frame_id.split("_")[-1])/1000000000
        image_json['heading'] = cam_trans_quaternions['rotation']
        image_json['position'] = cam_trans_quaternions['translation']
        image_json["camera_model"] =  "pinhole" # All image already undistorted

        # Camera Intrinsic matrix from lidar cam calib yaml file
        image_json['fx'] = camera_intrinsics[0]
        image_json['fy'] = camera_intrinsics[1]
        image_json['cx'] = camera_intrinsics[2]
        image_json['cy'] = camera_intrinsics[3]
        # Camera distortion from lidar cam calib yaml file
        image_json['k1'] = camera_distortion[0]
        image_json['k2'] = camera_distortion[1]
        image_json['k3'] = camera_distortion[2]
        image_json['k4'] = camera_distortion[3]
        # no tangential distortion
        image_json['p1'] = 0
        image_json['p2'] = 0
        # no skew
        image_json['skew'] = 0

        images.append(image_json)

        frame["images"] = images

        seq_json['frames'].append(frame)


seq_key = f"{PREFIX}/manifests_categories/sequence.json"
print(f"Creating sequence file: s3://{BUCKET}/{seq_key}")
write_json_to_s3(seq_json, BUCKET, seq_key)

# Building the input manifest file reference the sequences.  
# In this case we are only referencing one sequence.
manifest_line = [{
    "source-ref": f"s3://{BUCKET}/{seq_key}",
}]
manifest_key = f"{PREFIX}/manifests_categories/manifest.json"
write_manifest_to_s3(manifest_line, BUCKET, manifest_key)

manifest_uri = f"s3://{BUCKET}/{manifest_key}"

print(f"Creating manifest file: {manifest_uri}")

## Label Category Configuration File

Your label category configuration file is used to specify labels, or classes, for your labeling job.

When you use the object detection or object tracking task types, you can also include label attributes in your [label category configuration file](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html). Workers can assign one or more attributes you provide to annotations to give more information about that object. For example, you may want to use the attribute occluded to have workers identify when an object is partially obstructed.

Let's look at an example of the label category configuration file for an object detection or object tracking labeling job.

In [None]:
label_category = {
  "categoryGlobalAttributes": [
    {
      "enum": [
        "75-100%",
        "25-75%",
        "0-25%"
      ],
      "name": "Visibility",
      "type": "string"
    }
  ],
  "documentVersion": "2020-03-01",
  "instructions": {
    "fullInstruction": "Draw a tight Cuboid. You only need to annotate those in the first frame. Please make sure the direction of the cubiod is accurately representative of the direction of the vehicle it bounds.",
    "shortInstruction": "Draw a tight Cuboid. You only need to annotate those in the first frame."
  },
  "labels": [
    {
      "categoryAttributes": [],
      "label": "Car"
    },
    {
      "categoryAttributes": [],
      "label": "Truck"
    },
    {
      "categoryAttributes": [],
      "label": "Bus"
    },
    {
      "categoryAttributes": [],
      "label": "Pedestrian"
    },
    {
      "categoryAttributes": [],
      "label": "Cyclist"
    },
    {
      "categoryAttributes": [],
      "label": "Motorcyclist"
    },
  ]
}

category_key = f'{PREFIX}/manifests_categories/label_category.json'
write_json_to_s3(label_category, BUCKET, category_key)

label_category_file = f's3://{BUCKET}/{category_key}'
print(f"label category file uri: {label_category_file}")

## Specify the job resources

### Human Task UI ARN

[HumanTaskUiArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UiConfig.html) is an resource that defines the worker task template used to render the worker UI and tools for labeling job.  This attribute is defined under `UiConfig` and the resource name is configured by region and task type.

In [None]:
human_task_ui_arn = (
    f"arn:aws:sagemaker:{region}:394669845002:human-task-ui/{task_type[2:]}"
)
human_task_ui_arn

### Define your work resource

In this example, we will use private team resources.  Please follow this instruction to create a [private workforce ](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-cognito.html).  Once you are done, put your resource ARN in the parameter below.

In [None]:
workteam_arn = f"arn:aws:sagemaker:{region}:259883177473:workteam/private-crowd/test-team"#"<REPLACE W/ YOUR Private Team ARN>"
workteam_arn

### Pre-annotation Lambda ARN and Post-annotation Lambda ARN

In [None]:
ac_arn_map = {
    "us-west-2": "081040173940",
    "us-east-1": "432418664414",
    "us-east-2": "266458841044",
    "eu-west-1": "568282634449",
    "ap-northeast-1": "477331159723",
}

prehuman_arn = "arn:aws:lambda:{}:{}:function:PRE-{}".format(region, ac_arn_map[region], task_type)
acs_arn = "arn:aws:lambda:{}:{}:function:ACS-{}".format(region, ac_arn_map[region], task_type)

print(acs_arn)

## Set Up HumanTaskConfig

This is used to specify your work team,a nd configure your labeling job task.  Feel free to update the task description info below.

In [None]:
job_name = f"velodyne-blog-test-{str(time.time()).split('.')[0]}"

# Task description info =================
task_description = "Draw 3D boxes around required objects"
task_keywords = ['lidar', 'pointcloud']
task_title = job_name

human_task_config = {
    "AnnotationConsolidationConfig": {
        "AnnotationConsolidationLambdaArn": acs_arn,
    },
    "WorkteamArn": workteam_arn,
    "PreHumanTaskLambdaArn": prehuman_arn,
    "MaxConcurrentTaskCount": 200,
    "NumberOfHumanWorkersPerDataObject": 1,  # One worker will work on each task
    "TaskAvailabilityLifetimeInSeconds": 18000, # Your workteam has 5 hours to complete all pending tasks.
    "TaskDescription": task_description,
    "TaskKeywords": task_keywords,
    "TaskTimeLimitInSeconds": 36000, # Each seq must be labeled within 1 hour.
    "TaskTitle": task_title,
    "UiConfig": {
        "HumanTaskUiArn": human_task_ui_arn,
    },
}

## Set Up Create Labeling Request

In [None]:
labelAttributeName = f"{job_name}-ref" #must end with -ref

output_path = f"s3://{BUCKET}/{PREFIX}/output"

ground_truth_request = {
    "InputConfig" : {
      "DataSource": {
        "S3DataSource": {
          "ManifestS3Uri": manifest_uri,
        }
      },
      "DataAttributes": {
        "ContentClassifiers": [
          "FreeOfPersonallyIdentifiableInformation",
          "FreeOfAdultContent"
        ]
      },  
    },
    "OutputConfig" : {
      "S3OutputPath": output_path,
    },
    "HumanTaskConfig" : human_task_config,
    "LabelingJobName": job_name,
    "RoleArn": role, 
    "LabelAttributeName": labelAttributeName,
    "LabelCategoryConfigS3Uri": label_category_file,
    "Tags": [],
}

## Call CreateLabelingJob

In [None]:
sagemaker_client.create_labeling_job(**ground_truth_request)
print(f'Labeling Job created: {job_name}')

## Check Status of Labeling Job

In [None]:
## call describeLabelingJob
describeLabelingJob = sagemaker_client.describe_labeling_job(LabelingJobName=job_name)
print(describeLabelingJob['LabelingJobStatus'])

## Start Working on tasks

When you labeling job is ready, add yourself to your private work team and experiment with the worker's portal.  You should receive an email with the portal link, your username, and a temporary password.  When you login, select the labeling job from the list, and you should see the worker's portal like this.  (Note: it may take a few minutes for a new labeling job to show up in the portal)

![Labeling View 1](statics/behind_low.gif)
![Labeling View 2](statics/side_low.gif)

## View Output Data

Once you are done with the labeling job, click **Submit**, you can then view the output data in the S3 output location you specified above.

## Acknowledgments

Special thanks to Velodyne team for letting us use this dataset and demonstrate 3D point cloud labeling using SageMaker Ground Truth.