# How to use Vector Enrichment Jobs for Reverse Geocoding

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

---

This notebook demonstrates how to use Amazon SageMaker geospatial capabilities to perform reverse geocoding and visualize the results.

Reverse geocoding allows you to convert geographic coordinates (latitude, longitude) to human-readable addresses powered by Amazon Location Service. With Amazon SageMaker geospatial capabilities it is possible to perform batch reverse geocoding via a Vector Enrichtment Job (VEJ). The input for this kind of VEJ is a CSV file containing longitude and latitude coordinates, and the VEJ will enrich the CSV with the address number, country, label, municipality, neighborhood, postal code and region of that location.

The workflow is as follows:

- Step 1: [Import SageMaker geospatial capabilities SDK](#Import-SageMaker-geospatial-capabilities-SDK)
- Step 2: [Inspect input data and upload to S3](#Inspect-input-data-and-upload-to-S3)
- Step 3: [Create an Vector Enrichtment Job (VEJ)](#Create-an-Vector-Enrichtment-Job)
- Step 4: [Export VEJ output to S3](#Export-VEJ-output-to-S3)
- Step 5: [Visualize enriched data set in Amazon SageMaker geospatial Map SDK](#Visualize-enriched-data-set-in-Amazon-SageMaker-geospatial-Map-SDK)

## Prerequisites

This notebook runs with Kernel Geospatial 1.0. Note that the following policies need to be attached to the execution role that you used to run this notebook:

- AmazonSageMakerFullAccess
- AmazonSageMakerGeospatialFullAccess

You can see the policies attached to the role in the IAM console under the permissions tab. If required, add the roles using the 'Add Permissions' button.

In addition to these policies, ensure that the execution role's trust policy allows the SageMaker-GeoSpatial service to assume the role. This can be done by adding the following trust policy using the 'Trust relationships' tab:

```
{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Principal": {
 "Service": [
 "sagemaker.amazonaws.com",
 "sagemaker-geospatial.amazonaws.com"
 ]
 },
 "Action": "sts:AssumeRole"
 }
 ]
}
```

## Import SageMaker geospatial capabilities SDK

In [None]:
import boto3
import sagemaker
import sagemaker_geospatial_map

session = boto3.Session()
execution_role = sagemaker.get_execution_role()
geospatial_client = session.client(service_name="sagemaker-geospatial")

## Dataset: California Housing Data
The California Housing dataset contains information from the 1990 California census. We will use the dataset to demonstrate how to resolve the latitude and longitude into human readable address information. The dataset contains the following columns:

- `MedInc` - average income
- `HouseAge` - housing average age
- `TotalRooms` - total rooms
- `TotalBedrms` - total bedrooms
- `Population` - population
- `Households` - amount of households
- `Latitude` - latitude
- `Longitude` - longitude

The California Housing dataset was originally published in:

> Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.

## Inspect input data and upload to S3

The following cells will write the dataset as a CSV and upload the CSV to a S3 bucket. The CSV file needs to contain a header line. The header names are used in the `ReverseGeocodingConfig` for the Vector Enrichment Job for mapping the CSV columns to the expected attributes.

In [None]:
s3 = boto3.client("s3")
s3.download_file(
 f"sagemaker-example-files-prod-{boto3.Session().region_name}",
 "datasets/tabular/california_housing/cal_housing.tgz",
 "cal_housing.tgz",
)

In [None]:
!tar -zxf cal_housing.tgz --no-same-owner

In [None]:
import pandas as pd

columns = [
 "Longitude",
 "Latitude",
 "HouseAge",
 "TotalRooms",
 "TotalBedrms",
 "Population",
 "Households",
 "MedInc",
 "Target",
]
df_housing_data = pd.read_csv("CaliforniaHousing/cal_housing.data", names=columns, header=None)
df_housing_data = df_housing_data.drop(columns=["Target"])[:15000]
df_housing_data.to_csv("california_housing.csv", index=False)
df_housing_data.head(5)

In [None]:
import boto3

sagemaker_session = sagemaker.Session()
s3_bucket = sagemaker_session.default_bucket() # Alternatively you can use your custom bucket here.
bucket_prefix = "vej_reverse_geocoding"
input_object_key = f"{bucket_prefix}/input/california_housing.csv"

s3_client = boto3.client("s3")
response = s3_client.upload_file("california_housing.csv", s3_bucket, input_object_key)

## Create an Vector Enrichtment Job

The following cell will define and start a Vector Enrichment Job for reverse geocoding. The longitude and latitude headers of the CSV file are mapped to be used as input for the reverse geocoding implementation.

In [None]:
job_config = {
 "ReverseGeocodingConfig": {"XAttributeName": "Longitude", "YAttributeName": "Latitude"},
}

input_config = {
 "DataSourceConfig": {"S3Data": {"S3Uri": f"s3://{s3_bucket}/{input_object_key}"}},
 "DocumentType": "CSV",
}

response = geospatial_client.start_vector_enrichment_job(
 Name="vej_example_reverse_geocoding",
 ExecutionRoleArn=execution_role,
 InputConfig=input_config,
 JobConfig=job_config,
)

vej_arn = response["Arn"]
vej_arn

In [None]:
import time
import datetime

# check status of created Vector Enrichtment Job and wait until it is completed
job_completed = False
while not job_completed:
 response = geospatial_client.get_vector_enrichment_job(Arn=vej_arn)
 print(
 "Job status: {} (Last update: {})".format(response["Status"], datetime.datetime.now()),
 end="\r",
 )
 job_completed = True if response["Status"] == "COMPLETED" else False
 if not job_completed:
 time.sleep(30)

## Export VEJ output to S3

An export of a reverse geocoding VEJ produces a CSV which contains all columns of the input CSV and is extended with the following columns:
- reverse_geo.address_number
- reverse_geo.country
- reverse_geo.label
- reverse_geo.municipality
- reverse_geo.neighborhood
- reverse_geo.postal_code
- reverse_geo.region
- reverse_geo.status

The following cell will export the output of the VEJ into a S3 bucket.

In [None]:
bucket_output_prefix = f"{bucket_prefix}/output/"

response = geospatial_client.export_vector_enrichment_job(
 Arn=vej_arn,
 ExecutionRoleArn=execution_role,
 OutputConfig={"S3Data": {"S3Uri": f"s3://{s3_bucket}/{bucket_output_prefix}"}},
)

# Wait until VEJ has been exported to S3
while not response["ExportStatus"] == "SUCCEEDED":
 response = geospatial_client.get_vector_enrichment_job(Arn=vej_arn)
 print(
 "Export status: {} (Last update: {})".format(
 response["ExportStatus"], datetime.datetime.now()
 ),
 end="\r",
 )
 if not response["ExportStatus"] == "SUCCEEDED":
 time.sleep(15)

## Visualize enriched data set in Amazon SageMaker geospatial Map SDK

The following cells will create an interactive map with the Amazon SageMaker geospatial Map SDK. The output data of the VEJ will be loaded from S3 into a pandas dataframe and then visualized in the embedded map.

### Load VEJ output into pandas dataframe

In [None]:
import boto3
import os
import pandas as pd

s3_client = boto3.client("s3")
s3_bucket_resource = session.resource("s3").Bucket(s3_bucket)

for s3_object in s3_bucket_resource.objects.filter(Prefix=bucket_output_prefix).all():
 if s3_object.key.endswith(".csv"):
 response = s3_client.get_object(Bucket=s3_bucket, Key=s3_object.key)
 df_output = pd.read_csv(response.get("Body"))

# the output contains the original data but is extended with reverse_geo.* columns
df_output.head(5)

### Render embedded map

In [None]:
Map = sagemaker_geospatial_map.create_map({"is_raster": True})
Map.set_sagemaker_geospatial_client(geospatial_client)

In [None]:
Map.render()

### Add output data to map visualization

The following cell will add the output data as points to the map. When you hover over single items, it'll display the reverse_geo.label which resembles the address of the given point.

In [None]:
# move reverse_geo.label column to be automatically included in the hover tooltip
column_to_move = df_output.pop("reverse_geo.label")
df_output.insert(2, "reverse_geo.label", column_to_move)

dataset_links_drive_01 = Map.add_dataset(
 {"data": df_output, "label": "vej_output"}, auto_create_layers=True
)

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-geospatial|vector-enrichment-reverse-geocoding|vector-enrichment-reverse-geocoding.ipynb)
