{ "cells": [ { "cell_type": "markdown", "id": "b0cbd05d-fc56-4d34-869c-176716cefe76", "metadata": {}, "source": [ "\n", "## Explore Sentinel-1 RTC AWS Public Dataset\n", "\n", "This notebook explores Pangeo tooling to efficiently work with a collection of Cloud-Optimized Geotiffs\n", "\n", "https://registry.opendata.aws/sentinel-1-rtc-indigo/\n", "\n", "The Sentinel-1 mission is a constellation of C-band Synthetic Aperature Radar (SAR) satellites from the European Space Agency launched since 2014. These satellites collect observations of radar backscatter intensity day or night, regardless of the weather conditions, making them enormously valuable for environmental monitoring. These radar data have been processed from original Ground Range Detected (GRD) scenes into a Radiometrically Terrain Corrected, tiled product suitable for analysis. This product is available over the Contiguous United States (CONUS) since 2017 when Sentinel-1 data became globally available.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "19850f09-79fe-4b43-a097-df6ab55556a6", "metadata": {}, "outputs": [], "source": [ "#pip install geopandas geoviews" ] }, { "cell_type": "code", "execution_count": null, "id": "366b0b1e-bd2c-49aa-b64a-dc5eeb011a31", "metadata": {}, "outputs": [], "source": [ "# We'll make use of the following great Python libraries\n", "import geopandas as gpd\n", "import geoviews as gv\n", "import holoviews as hv\n", "import hvplot.pandas\n", "import hvplot.xarray\n", "import panel as pn\n", "import intake\n", "import numpy as np\n", "import os\n", "import pandas as pd\n", "import rasterio\n", "import rioxarray\n", "import s3fs \n", "import xarray as xr\n", "hv.extension('bokeh')" ] }, { "cell_type": "markdown", "id": "5cedbe48", "metadata": {}, "source": [ "## Metadata and data search" ] }, { "cell_type": "code", "execution_count": null, "id": "3d468dba-87d1-4458-9556-8807885fe5d9", "metadata": {}, "outputs": [], "source": [ "os.environ['GDAL_DISABLE_READDIR_ON_OPEN']='EMPTY_DIR' #This is KEY! otherwise we send a bunch of HTTP GET requests to test for common sidecar metadata\n", "os.environ['AWS_NO_SIGN_REQUEST']='YES' #Since this is a public bucket, we don't need authentication\n", "os.environ['GDAL_MAX_RAW_BLOCK_CACHE_SIZE']='200000000' #200MB: Default is 10 MB limit in the GeoTIFF driver for range request merging." ] }, { "cell_type": "code", "execution_count": null, "id": "87fa75f1-2c97-449a-83a1-a62db11ce979", "metadata": {}, "outputs": [], "source": [ "# This data is stored as tiles on the military grid system\n", "# This visualization helps you pick a tile of interest\n", "grid = 'https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/_downloads/a5f63bfdd3b923cde925b95de3bd1dbe/SENTINEL1_RTC_CONUS_GRID.geojson'\n", "gf = gpd.read_file(grid)\n", "gf.rename(columns=dict(id='tile'), inplace=True) #\"id\" reserved as a method name\n", "tiles = gv.tile_sources.OSM()\n", "polygons = gf.hvplot.polygons(geo=True, alpha=0.2, hover_cols=['tile'], legend=False)\n", "tiles * polygons" ] }, { "cell_type": "code", "execution_count": null, "id": "4fd912d6-f1e5-4c92-bad9-ef4a90f39f60", "metadata": {}, "outputs": [], "source": [ "# Path layout is as follows:\n", "# s3://sentinel-s1-rtc-indigo/tiles/RTC/1/[MODE]/[MGRS UTM zone]/[MGRS latitude label]/[MGRS Grid Square ID]/[YEAR]/[SATELLITE]_[DATE]_[TILE ID]_[ORBIT DIRECTION]/[ASSET]\n", "zone = 14\n", "latLabel = 'R'\n", "square = 'PP'\n", "year = '*' #all years\n", "date = '*' #all acquisitions\n", "polarization = 'VV'\n", "s3Path = f's3://sentinel-s1-rtc-indigo/tiles/RTC/1/IW/{zone}/{latLabel}/{square}/{year}/{date}/Gamma0_{polarization}.tif'\n", "print(s3Path[5:])" ] }, { "cell_type": "code", "execution_count": null, "id": "60cf21d2-d220-43d2-b405-1bd1a7b12ac5", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "# Find imagery according to S3 path pattern\n", "s3 = s3fs.S3FileSystem(anon=True)\n", "keys = s3.glob(s3Path[5:]) #strip s3://\n", "print(f'Located {len(keys)} images matching {s3Path}:')\n", "keys[:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "65ffdbbc-ea78-4aef-8779-2ffae607b28c", "metadata": {}, "outputs": [], "source": [ "\n", "\n", "# want to ensure these are sorted in chronological order (which S1A and S1B can mess up if going just by an alphabetical sort)\n", "keys.sort(key=lambda x: x[-32:-24])\n", "\n", "files = [key[-36:] for key in keys]\n", "satellites = [x[:3] for x in files]\n", "dates = [x[4:12] for x in files]\n", "direction = [x[19:22] for x in files]" ] }, { "cell_type": "code", "execution_count": null, "id": "d469ef82-9100-4269-98f2-0b147acd31dc", "metadata": {}, "outputs": [], "source": [ "\n", "\n", "# construct some tidy metadata to facilitate future plotting\n", "df = pd.DataFrame(dict(satellite=satellites,\n", " date=dates, #string\n", " datetime=pd.to_datetime(dates), #pandas timestamp\n", " direction=direction,\n", " file=files))\n", "df['dt'] = df.datetime.diff() # NOTE: images do not cover exactly the same area b/c this is swath data\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "f2edc248-8d20-4a57-bb81-6230d2b8f768", "metadata": {}, "outputs": [], "source": [ "# For example, we might want to quickly visualize the acquisition dates, satellite, and pass direction\n", "# This plot shows we have mostly Ascending pass observations from the Sentinel-1B satellite\n", "df.hvplot.scatter(x='datetime', y='satellite', by='direction', marker='dash', size=200, angle=90)" ] }, { "cell_type": "code", "execution_count": null, "id": "c0a370a8-3fcd-4106-aa99-d199c43aa74b", "metadata": {}, "outputs": [], "source": [ "# Get information on geospatial metadata and internal overviews\n", "s3Paths = ['s3://' + key for key in keys]\n", "url = s3Paths[0]\n", "print(url)\n", "with rasterio.open(url) as src:\n", " print(src.profile) \n", " overview_factors = [src.overviews(i) for i in src.indexes][0]\n", " overview_levels = list(range(len(overview_factors)))\n", " print('Overview levels: ', overview_levels)\n", " print('Overview factors: ', overview_factors) " ] }, { "cell_type": "code", "execution_count": null, "id": "fe17176a-4fcf-4cf5-819f-b84c48415548", "metadata": {}, "outputs": [], "source": [ "# creating s3paths.txt with /vsis/ prefix for gdalbuiltvrt, see: https://gdal.org/user/virtual_file_systems.html#vsis3-aws-s3-files\n", "with open('./Data-Files/3paths.txt', 'w') as f:\n", " for key in keys:\n", " f.write(\"/vsis3/%s\\n\" % key)" ] }, { "cell_type": "code", "execution_count": null, "id": "9dc87771-d942-4e1b-8fe7-1c48d4f26ca3", "metadata": {}, "outputs": [], "source": [ "#%%time\n", "# Create a VRT that points to all files in separate bands\n", "# Seems to take <30seconds for ~200 Tifs\n", "vrtName = f'./Data-Files/stack{zone}{latLabel}{square}.vrt'\n", "cmd = f'gdalbuildvrt -overwrite -separate -input_file_list ./Data-Files/3paths.txt {vrtName}'\n", "!{cmd}" ] }, { "cell_type": "code", "execution_count": null, "id": "01fa5a75-d2fa-456b-a3d3-66500cc95480", "metadata": {}, "outputs": [], "source": [ "# Customize the VRT adding date to metadata\n", "with rasterio.open(vrtName, 'r+') as src:\n", " print(f'adding filenames to {vrtName} band descriptions')\n", " src.descriptions = files\n", " # Or add as a metadata\n", " #src.meta['date'] = files # doesn't work\n", " #src.update_tags(date=files) # adds " ] }, { "cell_type": "markdown", "id": "93bf514a", "metadata": {}, "source": [ "## Working with low-resolution overviews" ] }, { "cell_type": "code", "execution_count": null, "id": "80327c17-8ba6-426b-b242-2ff1323e17b3", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# Use rioxarray to quickly load overviews for visualization, takes ~12s\n", "da3 = rioxarray.open_rasterio(vrtName, overview_level=3, chunks=dict(band=1)) \n", "da3" ] }, { "cell_type": "code", "execution_count": null, "id": "cc2187f0-eab6-4541-8998-4b8bd8584355", "metadata": {}, "outputs": [], "source": [ "print(f'Uncompressed dataset size= {da3.nbytes/1e6} Mb')" ] }, { "cell_type": "code", "execution_count": null, "id": "368620b1-e06d-4364-a72e-87297893b127", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# NOTE the overviews are only ~100MB, so pull it into memory before plotting for speed \n", "da3 = da3.compute()" ] }, { "cell_type": "code", "execution_count": null, "id": "ec19be30-ecdc-4996-86ec-a04ce459726d", "metadata": {}, "outputs": [], "source": [ "# Overview browser\n", "da3.hvplot.image(clim=(0,0.4), aspect='equal', frame_width=400, cmap='gray', widget_location='bottom')" ] }, { "cell_type": "code", "execution_count": null, "id": "b8b85442-1c01-4a43-91ba-a1d8a4bf883e", "metadata": {}, "outputs": [], "source": [ "# get filepths reading the vrt with rasterio\n", "with rasterio.open(vrtName) as src:\n", " filenames = src.descriptions\n", " datetimes = [pd.to_datetime(x[4:12]) for x in filenames]\n", " \n", "datetimes[:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "6a50d198-fd8b-4fea-bc91-ef9af178280e", "metadata": {}, "outputs": [], "source": [ "# add new coordinate to existing dimension \n", "da = da3.assign_coords(time=('band', datetimes))\n", "# make 'time' active coordinate instead of integer band\n", "da = da.swap_dims({'band':'time'})\n", "# Name the dataset (helpful for hvplot calls later on)\n", "da.name = 'Gamma0VV'" ] }, { "cell_type": "code", "execution_count": null, "id": "741fc2c2-68a4-453f-909e-2ae94285de82", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# Now we can get montly mosaics pretty easily (mean, min, max, etc)\n", "mosaic_mean = da.where(da!=0).resample(time='1m').mean()" ] }, { "cell_type": "code", "execution_count": null, "id": "4f3b3f95-ce46-4f53-b1e9-bc56701a5981", "metadata": {}, "outputs": [], "source": [ "video = mosaic_mean.hvplot.image(x='x',y='y', clim=(0,0.4), aspect='equal', frame_width=600, cmap='gray', \n", " widget_type='scrubber', widget_location='bottom') \n", "video" ] }, { "cell_type": "code", "execution_count": null, "id": "10220425-ef02-4e1e-bce3-457d2dd7938b", "metadata": {}, "outputs": [], "source": [ "# Save our derived dataset to a local file for future use\n", "#mosaic_mean.to_netcdf('mosasic_mean.nc')\n", "#da = xr.open_dataset('mosaic_mean.nc')\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a9538553-4ecd-4db2-9425-15634598fa38", "metadata": {}, "outputs": [], "source": [ "# vector file defining area of interest in lat/lon from geojson.io\n", "#gf = gpd.read_file('grand-mesa.geojson')\n", "#tiles = gv.tile_sources.EsriUSATopo\n", "#bbox = gf.hvplot.polygons(alpha=0.2, geo=True)\n", "#tiles * bbox " ] }, { "cell_type": "code", "execution_count": null, "id": "f583f389-0838-4e37-b6b7-fe6265104015", "metadata": {}, "outputs": [], "source": [ "gf.bounds" ] }, { "cell_type": "code", "execution_count": null, "id": "2169344b-7409-44b4-b7e5-1fbd5431b332", "metadata": {}, "outputs": [], "source": [ "gfp = gf.to_crs(da.rio.crs)\n", "print(f' Area of bounding box (km^2): {gfp.area.values[0]*1e-6 :.2f}')\n", "gfp.bounds" ] }, { "cell_type": "code", "execution_count": null, "id": "a8b7c2e4-151b-4cfd-b9ac-fc93153fb472", "metadata": {}, "outputs": [], "source": [ "%%time \n", "# Time series plot (flat area of grand mesa)\n", "# https://hvplot.holoviz.org/user_guide/Timeseries_Data.html\n", "\n", "xmin, ymax, xmax, ymin = gfp.bounds.values[0]\n", "\n", "daT = da.sel(x=slice(xmin, xmax), \n", " y=slice(ymin, ymax))\n", "\n", "\n", "from bokeh.models import DatetimeTickFormatter\n", "formatter = DatetimeTickFormatter(months='%b %Y')\n", "\n", "all_points = daT.where(daT!=0).hvplot.scatter('time', groupby=[], dynspread=True, datashade=True) \n", "mean_trend = daT.where(daT!=0, drop=True).mean(dim=['x','y']).hvplot.line(title='North Grand Mesa', color='red')\n", "(all_points * mean_trend).opts(xformatter=formatter)" ] }, { "cell_type": "code", "execution_count": null, "id": "3f9e379e-2529-440a-8371-24329bd917b9", "metadata": {}, "outputs": [], "source": [ "# save timeseries shown in plot to csv\n", "# scatter() and line() plots have a 'data' attribute that is just a pandas dataframe!\n", "mean_trend.data.to_csv('./Data-Files/mean_trend.csv')" ] }, { "cell_type": "markdown", "id": "65f01190-af4b-4cee-adbf-d924cc8c5d6c", "metadata": {}, "source": [ "## Full Resolution data" ] }, { "cell_type": "code", "execution_count": null, "id": "2b7dd352-514c-4797-8176-64cc933767ce", "metadata": {}, "outputs": [], "source": [ "%%time\n", "#To work with the full_resolution dataset, simply do not ask for overview_level\n", "da = rioxarray.open_rasterio(vrtName, masked=True, chunks=dict(band=1))#, x=512, y=512)) \n", "da = da.assign_coords(time=('band', datetimes))\n", "da = da.swap_dims({'band':'time'})\n", "da.name = 'Gamma0VV'\n", "#da" ] }, { "cell_type": "code", "execution_count": null, "id": "934ec736-f302-44c3-b4e4-115b2076ec29", "metadata": {}, "outputs": [], "source": [ "%%time \n", "# Rasterize=True requires holoviews!=1.13.4\n", "# https://github.com/holoviz/holoviews/issues/4653\n", "file_names = da.attrs.pop('long_name') #holoviews doesn't like these names in attributes\n", "images = da.hvplot.image(x='x',y='y',rasterize=True, # rasterize=True is key for large images \n", " clim=(0,0.4), aspect='equal', \n", " frame_width=400, cmap='gray',\n", " #widgets={'band': pn.widgets.IntSlider}, #override defaul DiscreteSlider\n", " widgets={'time': pn.widgets.Select}, #override defaul DiscreteSlider\n", " widget_location='bottom')\n", "images" ] }, { "cell_type": "code", "execution_count": null, "id": "dfc03788-437b-420c-8187-cd8f464817c3", "metadata": {}, "outputs": [], "source": [ "# It's common to reduce the dimensions of the data you're analyzing in some way\n", "# For example, hone in on a subset and compute the monthly mean:\n", "subset = da.sel(x=slice(7.345e5, 7.614e5), \n", " y=slice(4.3336e6, 4.306e6)\n", " ).resample(time='1m').mean() # monthly mean of subset (need time index)\n", "\n", "print('dataset size (MB):', subset.nbytes/1e6)\n", "subset" ] }, { "cell_type": "code", "execution_count": null, "id": "d6337127-59b4-4507-a282-7b11a872727b", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# video playback will be smoother if pixel in local or distributed memory\n", "# Takes ~1min to complete\n", "# TODO: figure out dask settings to speed this up\n", "from dask.diagnostics import ProgressBar\n", "\n", "with ProgressBar():\n", " frames = subset.compute()" ] }, { "cell_type": "code", "execution_count": null, "id": "639cd202-7592-4fe3-8cae-50e0afd0db1d", "metadata": {}, "outputs": [], "source": [ "#video = frames.hvplot.image(x='x',y='y', clim=(0,0.4), aspect='equal', frame_width=600, cmap='gray', \n", "# widget_type='scrubber', widget_location='bottom') \n", "#video" ] }, { "cell_type": "code", "execution_count": null, "id": "2247e45d-e309-4481-99d3-71b2cc8d5200", "metadata": {}, "outputs": [], "source": [ "## Save our derived dataset to a local file for future use\n", "#frames.to_netcdf('full_res_mosaic_mean.nc')\n", "#da = xr.open_dataset('full_res_mosaic_mean.nc')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.13 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "vscode": { "interpreter": { "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" } } }, "nbformat": 4, "nbformat_minor": 5 }