{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train a Scikit-Learn model in SageMaker and track with MLFlow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Intro\n",
    "\n",
    "The main objective of this notebook is to show how a user without write permissions to the MLflow server, is forbidden to create runs, experiments, register models, etc. Nontheless, with read permissions, the user can check details of what is going on.\n",
    "The SageMaker Studio user profile we well test is the `mlflow-reader`.\n",
    "This for example can be useful for auditing users on the MLflow server. \n",
    "\n",
    "## Pre-Requisites\n",
    "\n",
    "* Successfullyd deployed the CDK sample in [this repository](https://github.com/aws-samples/sagemaker-studio-mlflow-integration.git).\n",
    "* Access  to the `mlflow-reader` user profile in the created SageMaker Studio domain and use the `Base Python 2.0` image on a `Python 3` kernel.\n",
    "\n",
    "## Install required and/or update libraries\n",
    "\n",
    "At the time of writing, we have used the `sagemaker` SDK version 2. The MLFlow SDK library used is the one corresponding to our MLflow server version, i.e., `2.3.1`.\n",
    "We install the `mlflow[extras]==2.3.1` to ensure that all required dependencies are installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#This cell might take several minutes to execute\n",
    "\n",
    "!pip install -q --upgrade pip setuptools wheel\n",
    "!pip install sagemaker sagemaker-experiments scikit-learn==1.0.1 boto3 mlflow[extras]==2.3.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by specifying:\n",
    "\n",
    "- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the notebook instance, training, and hosting.\n",
    "- The IAM role arn used to give training and hosting access to your data. See the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/using-identity-based-policies.html) for more details on creating these.  Note, if a role not associated with the current notebook instance, or more than one role is required for training and/or hosting, please replace `sagemaker.get_execution_role()` with a the appropriate full IAM role arn string(s).\n",
    "- The tracking URI where the MLFlow server runs\n",
    "\n",
    "If you examine the SageMaker execution role of the `mlflow-reader`, you will note that it has a in-line policy attached called `restApiReader` grating read permissions on all resources and methods on the REST API Gateway shielding MLflow and it looks like the following:\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Action\": \"execute-api:Invoke\",\n",
    "            \"Resource\": [\n",
    "                \"arn:aws:execute-api:<AWS_REGION>:<AWS_ACCOUNT>:<REST_API_GW_ID>/*/GET/*\",\n",
    "                \"arn:aws:execute-api:<AWS_REGION>:<AWS_ACCOUNT>:<REST_API_GW_ID>/*/POST/api/2.0/mlflow/runs/search\",\n",
    "                \"arn:aws:execute-api:<AWS_REGION>:<AWS_ACCOUNT>:<REST_API_GW_ID>/*/POST/api/2.0/mlflow/experiments/search\"\n",
    "            ],\n",
    "            \"Effect\": \"Allow\"\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "import json\n",
    "import random\n",
    "import boto3\n",
    "\n",
    "## SageMaker and SKlearn libraries\n",
    "import sagemaker\n",
    "from sagemaker.sklearn.estimator import SKLearn\n",
    "from sagemaker.tuner import IntegerParameter, HyperparameterTuner\n",
    "\n",
    "## SKLearn libraries\n",
    "from sklearn.datasets import fetch_california_housing\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "## MLFlow libraries\n",
    "import mlflow\n",
    "from mlflow.tracking.client import MlflowClient\n",
    "import mlflow.sagemaker\n",
    "\n",
    "ssm = boto3.client('ssm')\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "role = sagemaker.get_execution_role()\n",
    "bucket = sess.default_bucket()\n",
    "region = sess.boto_region_name\n",
    "account = role.split(\"::\")[1].split(\":\")[0]\n",
    "tracking_uri = ssm.get_parameter(Name=\"mlflow-restApiUrl\")['Parameter']['Value']\n",
    "mlflow_amplify_ui = ssm.get_parameter(Name=\"mlflow-uiUrl\")['Parameter']['Value']\n",
    "api_gw_id = tracking_uri.split('//')[1].split('.')[0]\n",
    "experiment_name = 'DEMO-sigv4'\n",
    "model_name = 'california-housing-model'\n",
    "\n",
    "NOTEBOOK_METADATA_FILE = \"/opt/ml/metadata/resource-metadata.json\"\n",
    "\n",
    "if os.path.exists(NOTEBOOK_METADATA_FILE):\n",
    "    with open(NOTEBOOK_METADATA_FILE, \"rb\") as f:\n",
    "        user = json.loads(f.read())['UserProfileName']\n",
    "        if user != 'mlflow-reader':\n",
    "            raise ValueError(\"Sorry, you should use the 'mlflow-reader' user profile to run this sample.\")\n",
    "\n",
    "print(\"Tracking URI: {}\".format(tracking_uri))\n",
    "print(\"MLFlow UI (on Amplify): {}\".format(mlflow_amplify_ui))\n",
    "print('SageMaker role: {}'.format(role.split(\"/\")[-1]))\n",
    "print('bucket: {}'.format(bucket))\n",
    "print('Account: {}'.format(account))\n",
    "print(\"Using AWS Region: {}\".format(region))\n",
    "print(\"MLflow server URI: {}\".format(tracking_uri))\n",
    "print(\"user profile: {}\".format(user))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### With env variable set: should succeed is the sagemaker execution role has permission to call the MLFlow endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "os.environ['MLFLOW_TRACKING_AWS_SIGV4'] = \"True\"\n",
    "mlflow.set_tracking_uri(tracking_uri)\n",
    "mlflow.set_experiment(experiment_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python3 -m requests_auth_aws_sigv4 https://{api_gw_id}.execute-api.{region}.amazonaws.com/prod/api/2.0/mlflow/experiments/get?experiment_id=0 -v"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Preparation\n",
    "We load the dataset from sklearn, then split the data in training and testing datasets, where we allocate 75% of the data to the training dataset, and the remaining 25% to the traning dataset.\n",
    "\n",
    "The variable `target` is what we intend to estimate, which represents the value of a house, expressed in hundreds of thousands of dollars ($100,000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# we use the California housing dataset \n",
    "data = fetch_california_housing()\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=42)\n",
    "\n",
    "trainX = pd.DataFrame(X_train, columns=data.feature_names)\n",
    "trainX['target'] = y_train\n",
    "\n",
    "testX = pd.DataFrame(X_test, columns=data.feature_names)\n",
    "testX['target'] = y_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we save a copy of the data locally, as well as in S3. The data stored in S3 will be used SageMaker to train and test the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# save the data locally\n",
    "trainX.to_csv('california_train.csv', index=False)\n",
    "testX.to_csv('california_test.csv', index=False)\n",
    "\n",
    "import random\n",
    "import string\n",
    "\n",
    "prefix = f\"mlflow-sample/{random.choices(string.ascii_lowercase, k=8)}/sklearncontainer\"\n",
    "# save the data to S3.\n",
    "train_path = sess.upload_data(path='california_train.csv', bucket=bucket, key_prefix=prefix)\n",
    "test_path = sess.upload_data(path='california_test.csv', bucket=bucket, key_prefix=prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup SageMaker Experiments\n",
    "\n",
    "SageMaker Experiments is an AWS service for tracking machine learning Experiments. The SageMaker Experiments Python SDK is a high-level interface to this service that helps you track Experiment information using Python.\n",
    "\n",
    "Conceptually, these are the following entities within `SageMaker Experiments`:\n",
    "\n",
    "* Experiment: A collection of related Trials. Add Trials to an Experiment that you wish to compare together.\n",
    "* Trial: A description of a multi-step machine learning workflow. Each step in the workflow is described by a TrialComponent.\n",
    "* TrialComponent: A description of a single step in a machine learning workflow.\n",
    "* Tracker: A Python context-manager for logging information about a single TrialComponent.\n",
    "\n",
    "When running jobs (both training and processing ones) in the SageMaker managed infrastructure, SageMaker creates automatically a <i>TrialComponent</i>. <i>TrialComponents</i> includes by default jobs metadata and lineage information about the input and output data, models artifacts and metrics (for training jobs), and within your training script these data can be further enriched.\n",
    "\n",
    "We want to show how you can easily enable a two-way interaction between MLflow and SageMaker Experiments.\n",
    "\n",
    "Let us first create an `Experiment` and a `Trial`. These two entities are used to keep your experimentation organized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from smexperiments.experiment import Experiment\n",
    "from smexperiments.trial import Trial\n",
    "from smexperiments.trial_component import TrialComponent\n",
    "from smexperiments.tracker import Tracker\n",
    "\n",
    "import time\n",
    "\n",
    "try:\n",
    "    my_experiment = Experiment.load(experiment_name=experiment_name)\n",
    "    print(\"existing experiment loaded\")\n",
    "except Exception as ex:\n",
    "    if \"ResourceNotFound\" in str(ex):\n",
    "        my_experiment = Experiment.create(\n",
    "            experiment_name = experiment_name,\n",
    "            description = \"MLFlow and SageMaker integration\"\n",
    "        )\n",
    "        print(\"new experiment created\")\n",
    "    else:\n",
    "        print(f\"Unexpected {ex}=, {type(ex)}\")\n",
    "        print(\"Dont go forward!\")\n",
    "        raise\n",
    "\n",
    "trial_name = \"trial-v1\"\n",
    "\n",
    "try:\n",
    "    my_first_trial = Trial.load(trial_name=trial_name)\n",
    "    print(\"existing trial loaded\")\n",
    "except Exception as ex:\n",
    "    if \"ResourceNotFound\" in str(ex):\n",
    "        my_first_trial = Trial.create(\n",
    "            experiment_name=experiment_name,\n",
    "            trial_name=trial_name,\n",
    "        )\n",
    "        print(\"new trial created\")\n",
    "    else:\n",
    "        print(f\"Unexpected {ex}=, {type(ex)}\")\n",
    "        print(\"Dont go forward!\")\n",
    "        raise\n",
    "\n",
    "create_date = time.strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
    "\n",
    "experiment_config = {\n",
    "    \"ExperimentName\": experiment_name,\n",
    "    \"TrialName\": trial_name,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "For this example, we use the `SKlearn` framework in script mode with SageMaker. Let us explore in more details the different components we need to define.\n",
    "\n",
    "### Traning script and SageMaker environment\n",
    "\n",
    "The `./source_dir/train_env_variables.py` script provides all the code we need for training a SageMaker model. The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, such as:\n",
    "\n",
    "* `SM_MODEL_DIR`: A string representing the path to the directory to write model artifacts to. These artifacts are uploaded to S3 for model hosting.\n",
    "* `SM_CHANNEL_TRAIN`: A string representing the path to the directory containing data in the 'training' channel.\n",
    "* `SM_CHANNEL_TEST`: A string representing the path to the directory containing data in the 'testing' channel.\n",
    "\n",
    "\n",
    "For more information about training environment variables, please visit \n",
    "[SageMaker Training Toolkit](https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md).\n",
    "\n",
    "We want to highlight in particular `SM_TRAINING_ENV` since it provides all the training information as a JSON-encoded dictionary (see [here](https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md#sm_training_env) for more details).\n",
    "\n",
    "#### Hyperparmeters\n",
    "\n",
    "We are using the `RandomForestRegressor` algorithm from the SKlearn framework. For the purpose of this exercise, we are only using a subset of hyperparameters supported by this algorithm, i.e. `n-estimators` and `min-samples-leaf`\n",
    "\n",
    "If you would like to know more the different hyperparmeters for this algorithm, please refer to the [`RandomForestRegressor` official documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html).\n",
    "\n",
    "Furthermore, it is important to note that for the purpose of this excercise, we are essentially omitting completely the feature engineering step, which is an essential step in any machine learning problem.\n",
    "\n",
    "#### MLFlow interaction\n",
    "\n",
    "To interact with the MLFlow server, we use the mlflow SDK, which allows us to set the tracking URI and the experiment name. One this initial setup is completed, we can store the parameters used (`mlflow.log_params(params)`), the model that is generated (`mlflow.sklearn.log_model(model, \"model\")`) with its associated metrics (`mlflow.log_metric(f'AE-at-{str(q)}th-percentile', np.percentile(a=abs_err, q=q))`).\n",
    "\n",
    "TODO: explain the `mlflow.autolog()` and the <i>System Tags</i> (add link) and how to overwrite them to have the right reference in SageMaker\n",
    "\n",
    "#### SageMaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pygmentize ./source_dir/train.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SKlearn container\n",
    "\n",
    "For this example, we use the `SKlearn` framework in script mode with SageMaker. For more information please refere to [the official documentation](https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/using_sklearn.html)\n",
    "\n",
    "Our training script makes use of other 3rd party libraries, i.e. `mlflow`, which are not installed by default in the `Sklearn` container SageMaker provides. However, this can be easily overcome by supplying a `requirement.txt` file in the `source_dir` folder, which then SageMaker will `pip`-install before executing the training script.\n",
    "\n",
    "### Metric definition\n",
    "\n",
    "SageMaker emits every log to CLoudWatch. Since we are using scripting mode, we need to specify a metric definition object to define the format of the metric we are interested in via regex, so that SageMaker knows how to extract this metric from the CloudWatch logs of the training job.\n",
    "\n",
    "In our case our custom metric is as follow\n",
    "\n",
    "```python\n",
    "metric_definitions = [{'Name': 'median-AE', 'Regex': \"AE-at-50th-percentile: ([0-9.]+).*$\"}]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "metric_definitions = [{'Name': 'median-AE', 'Regex': \"AE-at-50th-percentile: ([0-9.]+).*$\"}]\n",
    "\n",
    "hyperparameters = {\n",
    "    'n-estimators': 100,\n",
    "    'min-samples-leaf': 3,\n",
    "    'features': 'MedInc HouseAge AveRooms AveBedrms Population AveOccup',\n",
    "    'target': 'target'\n",
    "}\n",
    "\n",
    "environment={\n",
    "        \"AWS_DEFAULT_REGION\": region,\n",
    "        \"MLFLOW_EXPERIMENT_NAME\": experiment_name,\n",
    "        \"MLFLOW_TRACKING_URI\": tracking_uri,\n",
    "        \"MLFLOW_AMPLIFY_UI_URI\": mlflow_amplify_ui,\n",
    "        \"MLFLOW_TRACKING_AWS_SIGV4\": \"true\",\n",
    "        \"MLFLOW_USER\": user\n",
    "    }\n",
    "\n",
    "estimator = SKLearn(\n",
    "    entry_point='train.py',\n",
    "    source_dir='source_dir',\n",
    "    role=role,\n",
    "    metric_definitions=metric_definitions,\n",
    "    hyperparameters=hyperparameters,\n",
    "    instance_count=1,\n",
    "    instance_type='ml.m5.large',  # to run SageMaker in a managed infrastructure\n",
    "    framework_version='1.0-1',\n",
    "    base_job_name='mlflow',\n",
    "    environment=environment\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to execute the training as a SageMaker Training job on the SageMaker managed infrastructure. However, differently from the `mlflow-admin` user, the SageMaker execution role of `mlflow-reader` cannot create new runs, thus the SageMaker Training job will fail.\n",
    "The error message returned explains the reasons and it looks like the following:\n",
    "\n",
    "```json\n",
    "{\n",
    "    'Message': 'User: arn:aws:sts::<AWS_ACCOUNT>:assumed-role/SageMakerStudioUserStack-sagemakermlflowreaderrole-1NX32OI2LUKEN/SageMaker is not authorized to perform: execute-api:Invoke on resource: arn:aws:execute-api:<AWS_REGION>:********2473:<REST_API_GW_ID>/prod/POST/api/2.0/mlflow/runs/create'\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "estimator.fit({'train':train_path, 'test': test_path}, experiment_config=experiment_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read details about executed runs and model registered\n",
    "\n",
    "Nonetheless, this user is capable of reading details about specific runs, registered models, etc. For example, in this case we want to see the best run for our experiment by looking at the `metrics.accuracy` value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from mlflow.entities import ViewType\n",
    "\n",
    "experiment = mlflow.set_experiment(experiment_name)\n",
    "\n",
    "client = MlflowClient()\n",
    "\n",
    "run =client.search_runs(\n",
    "    experiment_ids=experiment.experiment_id,\n",
    "    filter_string=\"\",\n",
    "    run_view_type=ViewType.ACTIVE_ONLY,\n",
    "    max_results=1,\n",
    "    order_by=[\"metrics.accuracy DESC\"],\n",
    ")[0]\n",
    "\n",
    "print(run)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create registered models and model versions\n",
    "\n",
    "As expected, it is not possible to create either a new registered model, nor a new model version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    client.create_registered_model(model_name)\n",
    "except Exception as e:\n",
    "    print(f\"Exception: {str(e)}\")\n",
    "\n",
    "try:\n",
    "    model_version = client.create_model_version(\n",
    "        name=model_name,\n",
    "        source=\"{}/model\".format(run.info.artifact_uri),\n",
    "        run_id=run.info.run_uuid\n",
    "    )\n",
    "    print(\"model_version: {}\".format(model_version))\n",
    "except Exception as e:\n",
    "    print(f\"Exception: {str(e)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Listing and searching MLflow Models\n",
    "\n",
    "Nonetheless, it is possible to access the existing registered models and all model versions.\n",
    "See [official docs](https://mlflow.org/docs/latest/model-registry.html#listing-and-searching-mlflow-models) for more info.\n",
    "\n",
    "### Search for registered models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "\n",
    "client = MlflowClient()\n",
    "for rm in client.search_registered_models():\n",
    "    pprint(dict(rm), indent=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Search for model versions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "client = MlflowClient()\n",
    "for mv in client.search_model_versions(f\"name='{model_name}'\"):\n",
    "    pprint(dict(mv), indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "### Get the model URI from the MLflow model registry"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "registered_model = client.search_registered_models(f\"name='{model_name}'\")[0]\n",
    "model_version = registered_model.latest_versions[0]\n",
    "\n",
    "model_uri = model_version.source\n",
    "print(\"Model URI: {}\".format(model_uri))\n",
    "\n",
    "# Load model as a Sklearn model.\n",
    "loaded_model = mlflow.sklearn.load_model(model_uri)\n",
    "\n",
    "# get a random index to test the prediction from the test data\n",
    "index = random.randrange(0, len(testX))\n",
    "print(\"Random index value: {}\".format(index))\n",
    "\n",
    "# Prepare data on a Pandas DataFrame to make a prediction.\n",
    "data = testX.drop(['Latitude','Longitude','target'], axis=1).iloc[[index]]\n",
    "\n",
    "print(\"#######\\nData for prediction \\n{}\".format(data))\n",
    "\n",
    "y_hat = loaded_model.predict(data)[0]\n",
    "y = y_test[index]\n",
    "\n",
    "print(\"Predicted value: {}\".format(y_hat))\n",
    "print(\"Actual value: {}\".format(y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Star Github Repository\n",
    "\n",
    "If you have found this sample useful, do not hesitate to star the GitHub repository"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%html\n",
    "\n",
    "<a class=\"github-button\" href=\"https://github.com/aws-samples/sagemaker-studio-mlflow-integration\" data-color-scheme=\"no-preference: light; light: light; dark: dark;\" data-icon=\"octicon-star\" data-size=\"large\" data-show-count=\"true\" aria-label=\"Star Amazon SageMaker secure MLOps on GitHub\">Star</a>\n",
    "<script async defer src=\"https://buttons.github.io/buttons.js\"></script>"
   ]
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "interpreter": {
   "hash": "04ffa0b675ec4736afd1210dd81a6f70b0b4fa83298b056bd6b4e16ede0b389c"
  },
  "kernelspec": {
   "display_name": "Python 3 (Base Python 2.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/sagemaker-base-python-38"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}