{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Digital Farming with Amazon SageMaker Geospatial Capabilities - Part II\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "\n", "In this notebook, we continue explore some of most common tasks for processing geospatial data in the Digital Farming domain, by working with Amazon SageMaker geospatial capabilities.\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment Set-Up\n", "\n", "We will start by making sure the \"sagemaker\" SDK is updated, and importing a few libraries required." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# Install Reinvent Wheels\n", "!pip install sagemaker --upgrade\n", "!pip install rasterio" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "\n", "import json\n", "from datetime import datetime\n", "import rasterio\n", "from rasterio.plot import show\n", "from matplotlib import pyplot as plt\n", "import time" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "sagemaker_session = sagemaker.Session()\n", "bucket = sagemaker_session.default_bucket() ### Replace with your own bucket if needed\n", "role = sagemaker.get_execution_role(sagemaker_session)\n", "sess = boto3.Session()\n", "prefix = \"sm-geospatial-e2e\" ### Replace with the S3 prefix desired\n", "print(f\"S3 bucket: {bucket}\")\n", "print(f\"Role: {role}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure you have the proper policy and trust relationship added to your role for \"sagemaker-geospatial\", as specified in the [Get Started with Amazon SageMaker Geospatial Capabiltiies](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-getting-started.html) documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "gsClient = boto3.client(\"sagemaker-geospatial\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "## Other common geospatial processing tasks for Digital Farming\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will save the ARN of our collection of interest. In our example we will work with satellite imagery data from the [Sentinel-2-L2A](https://registry.opendata.aws/sentinel-2-l2a-cogs/) collection..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "data_collection_arn = \"arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8\"\n", "### Replace with the ARN of the collection of your choice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we will define the input configuration with the polygon of coordinates for our area of interest and the time range we are interested on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### Replace with the coordinates for the polygon of your area of interest...\n", "coordinates = [\n", " [9.742977, 53.615875],\n", " [9.742977, 53.597119],\n", " [9.773620, 53.597119],\n", " [9.773620, 53.615875],\n", " [9.742977, 53.615875],\n", "]\n", "### Replace with the time-range of interest...\n", "time_start = \"2022-03-01T12:00:00Z\"\n", "time_end = \"2022-03-31T12:00:00Z\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Typically, we are interested on working with images that are not covered by much clouds over our area of interest. For exploring this in our notebook, we will define some additional parameters like e.g. the ranges for cloud cover we want to consider (less than 2% in our example)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "eoj_input_config = {\n", " \"RasterDataCollectionQuery\": {\n", " \"AreaOfInterest\": {\n", " \"AreaOfInterestGeometry\": {\"PolygonGeometry\": {\"Coordinates\": [coordinates]}}\n", " },\n", " \"TimeRangeFilter\": {\"StartTime\": time_start, \"EndTime\": time_end},\n", " \"PropertyFilters\": {\n", " \"Properties\": [{\"Property\": {\"EoCloudCover\": {\"LowerBound\": 0, \"UpperBound\": 2}}}]\n", " },\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "eoj_input_config[\"RasterDataCollectionQuery\"][\"RasterDataCollectionArn\"] = data_collection_arn\n", "\n", "\n", "def start_earth_observation_job(eoj_name, role, eoj_input_config, eoj_config):\n", " # Start EOJ...\n", " response = gsClient.start_earth_observation_job(\n", " Name=eoj_name,\n", " ExecutionRoleArn=role,\n", " InputConfig=eoj_input_config,\n", " JobConfig=eoj_config,\n", " )\n", " eoj_arn = response[\"Arn\"]\n", " print(f\"{datetime.now()} - Started EOJ: {eoj_arn}\")\n", "\n", " # Wait for EOJ to complete... check status every minute\n", " gs_get_eoj_resp = {\"Status\": \"IN_PROGRESS\"}\n", " while gs_get_eoj_resp[\"Status\"] == \"IN_PROGRESS\":\n", " time.sleep(60)\n", " gs_get_eoj_resp = gsClient.get_earth_observation_job(Arn=eoj_arn)\n", " print(f'{datetime.now()} - Current EOJ status: {gs_get_eoj_resp[\"Status\"]}')\n", " return eoj_arn" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "----\n", "\n", "### Temporal Statistics - Earth Observation Job\n", "\n", "Following our example, we will now perform Temporal Statistics through another EOJ, this will allow consolidating the imagery of the area of interest for a given time-period.\n", "\n", "For our example, let us consider the yearly mean, and explore the Near Infrared (NIR) band in particular." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "eoj_config = {\n", " \"TemporalStatisticsConfig\": {\n", " \"GroupBy\": \"YEARLY\",\n", " \"Statistics\": [\"MEAN\"],\n", " \"TargetBands\": [\"nir\"],\n", " }\n", "}\n", "\n", "ts_eoj_arn = start_earth_observation_job(\n", " f'tempstatsjob-{datetime.now().strftime(\"%Y-%m-%d-%H-%M\")}', role, eoj_input_config, eoj_config\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note the EOJ processing takes some minutes.** We can check the status programatically by getting the EOJ with the SageMaker Geospatial client, or graphically by using the Geospatial extension for SageMaker Studio." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stacking - Earth Observation Job\n", "\n", "Following our example, we will now perform a band stacking through another EOJ. This allow us to combine bands together for obtaining different types of observations.\n", "\n", "In our case, we will generate the composite image of the Red, Green, and Blue (RGB) bands for obtaining the natural or true color image of the area of interest.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "eoj_config = {\n", " \"StackConfig\": {\n", " \"OutputResolution\": {\"Predefined\": \"HIGHEST\"},\n", " \"TargetBands\": [\"red\", \"green\", \"blue\"],\n", " }\n", "}\n", "\n", "s_eoj_arn = start_earth_observation_job(\n", " f'stackingjob-{datetime.now().strftime(\"%Y-%m-%d-%H-%M\")}', role, eoj_input_config, eoj_config\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note the EOJ processing takes some minutes.** We can check the status programatically by getting the EOJ with the SageMaker Geospatial client, or graphically by using the Geospatial extension for SageMaker Studio." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Semantic Segmentation for Land Cover Classification - Earth Observation Job\n", "\n", "We will now explore the use of a built-in model in SageMaker Geospatial for detecting and classifying the different types of land found in the area of interest, through the Semantic Segmentation Land Cover model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can run an EOJ for performing the land cover classification on it. This would use the built-in model and perform the segmentation inference on our input data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "eoj_config = {\"LandCoverSegmentationConfig\": {}}\n", "\n", "lc_eoj_arn = start_earth_observation_job(\n", " f'landcovermodeljob-{datetime.now().strftime(\"%Y-%m-%d-%H-%M\")}',\n", " role,\n", " eoj_input_config,\n", " eoj_config,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note the EOJ processing takes some minutes.** We can check the status programatically by getting the EOJ with the SageMaker Geospatial client, or graphically by using the Geospatial extension for SageMaker Studio." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----\n", "\n", "## Exporting the Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned before, the results of our EOJs are stored in the service and are available for chaining as input for another EOJ, but can also export these to Amazon S3 for visualizing the imagery directly.\n", "\n", "We will define a function for exporting the results of our EOJs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def export_earth_observation_job(eoj_arn, role, bucket, prefix, task_suffix):\n", " # Export EOJ results to S3...\n", " response = gsClient.export_earth_observation_job(\n", " Arn=eoj_arn,\n", " ExecutionRoleArn=role,\n", " OutputConfig={\n", " \"S3Data\": {\"S3Uri\": f\"s3://{bucket}/{prefix}/{task_suffix}/\", \"KmsKeyId\": \"\"}\n", " },\n", " )\n", " export_arn = response[\"Arn\"]\n", " print(f\"{datetime.now()} - Exporting with ARN: {export_arn}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's go through the EOJs created before for checking it's status and exporting accordingly. Keep in mind each EOJ takes some minutes to complete, so we will add a check on the status every 30 seconds..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Check status of EOJs...\n", "EOJs = [ts_eoj_arn, s_eoj_arn, lc_eoj_arn]\n", "eoj_suffixes = [\"temp_stat\", \"stacking\", \"land_cover\"]\n", "\n", "eoj_status = [\"\"] * len(EOJs)\n", "while not all(i == \"Exported\" for i in eoj_status):\n", " # Wait for EOJs to complete and export... check status every 30 seconds\n", " for j, eoj in enumerate(EOJs):\n", " gs_get_eoj_resp = gsClient.get_earth_observation_job(Arn=eoj)\n", " if gs_get_eoj_resp[\"Status\"] == \"COMPLETED\":\n", " # EOJ completed, exporting...\n", " if not \"ExportStatus\" in gs_get_eoj_resp:\n", " export_earth_observation_job(eoj, role, bucket, prefix, eoj_suffixes[j])\n", " elif gs_get_eoj_resp[\"ExportStatus\"] == \"IN_PROGRESS\":\n", " eoj_status[j] = \"Exporting\"\n", " elif gs_get_eoj_resp[\"ExportStatus\"] == \"SUCCEEDED\":\n", " eoj_status[j] = \"Exported\"\n", " else:\n", " raise Exception(\"Error exporting\")\n", " elif gs_get_eoj_resp[\"Status\"] == \"IN_PROGRESS\":\n", " # EOJ still in progress, keep waiting...\n", " eoj_status[j] = \"In progress\"\n", " else:\n", " raise Exception(\"Error with the EOJ\")\n", " print(f\"{datetime.now()} - EOJ: {eoj} Status: {eoj_status[j]}\")\n", " if all(i == \"Exported\" for i in eoj_status):\n", " break\n", " time.sleep(30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have all our EOJs exported, let's visualize a few of the images obtained in S3.\n", "\n", "For this we will use the open library \"rasterio\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "s3 = boto3.resource(\"s3\")\n", "my_bucket = s3.Bucket(bucket)\n", "\n", "\n", "def visualize_cogt(task, eoj_arn, band, number):\n", " gs_get_eoj_resp = gsClient.get_earth_observation_job(Arn=eoj_arn)\n", " if gs_get_eoj_resp[\"ExportStatus\"] == \"SUCCEEDED\":\n", " i = 0\n", " for index, image in enumerate(\n", " my_bucket.objects.filter(\n", " Prefix=f'{prefix}/{task}/{eoj_arn.split(\"/\",1)[1]}/output/consolidated/'\n", " )\n", " ):\n", " if f\"{band}.tif\" in image.key:\n", " i = i + 1\n", " tif = f\"s3://{bucket}/{image.key}\"\n", " with rasterio.open(tif) as src:\n", " arr = src.read(out_shape=(src.height // 20, src.width // 20))\n", " if band != \"visual\":\n", " # Sentinel-2 images are stored as uint16 for optimizing storage\n", " # but these need to be reslaced (by dividing each pixel value by 10000)\n", " # to get the true reflectance values. This is a common \u201ccompression\u201d\n", " # technique when storing satellite images...\n", " arr = arr / 10000\n", " # As a result of the transformation, there might be some pixel values\n", " # over 1 in the RGB, so we need to replace those by 1...\n", " arr[arr > 1] = 1\n", " show(arr)\n", " print(tif)\n", " if i == number:\n", " break\n", " else:\n", " print(\n", " f'Export of job with ARN:\\n{eoj_arn}\\nis in ExportStatus: {gs_get_eoj_resp[\"ExportStatus\"]}'\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the Temporal Statistics, we can check in example some of the images obtained for the mean in the NIR band." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "visualize_cogt(\"temp_stat\", ts_eoj_arn, \"nir_mean\", 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the Stacking, let's visualize the some of the stacked images for the natural color." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "visualize_cogt(\"stacking\", s_eoj_arn, \"stacked\", 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the Land Cover classification, let's visualize a few of the output images obtained after the built-in segmentation inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "visualize_cogt(\"land_cover\", lc_eoj_arn, \"L2A\", 30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, take into account the legend for the segmentation the below.\n", "\n", "