` run and choose **Charts** in the **Overview** section on the left pane. You see the three added charts in the run:\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "6949ec51-5f38-4432-9a5d-ed4e5350644f",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "9d8a7389-9af4-4a0b-8ea7-376db0c2ed02",
"metadata": {},
"source": [
"## Optional: Hyperparameter optimization (HPO)\n",
"It takes about 20 minutes to run this section. The section is optional and you don't need to run it to continue with other notebooks. You can navigate directly to step 3 [notebook](03-sagemaker-pipeline.ipynb). If you would like to perform a model A/B test in **Additional topics** sections, you can execute this part to produce an alternative model.\n",
"\n",
"[Amazon SageMaker automatic model tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html), also called hyperparameter optimization (HPO), finds the best performing model against a defined objective metric by running many training jobs on the dataset using the algorithm and ranges of hyperparameters that you specify. SageMaker HPT supports random search, bayesian optimization, and [hyperband](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html) as tuning strategies."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5230ec51-bab2-40ca-a5ce-3a31590d47a5",
"metadata": {},
"outputs": [],
"source": [
"# import required HPO objects\n",
"from sagemaker.tuner import (\n",
" CategoricalParameter,\n",
" ContinuousParameter,\n",
" HyperparameterTuner,\n",
" IntegerParameter,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a31feb51-0ca9-4517-8c1a-02415970a46b",
"metadata": {},
"outputs": [],
"source": [
"# set up hyperparameter ranges\n",
"hp_ranges = {\n",
" \"min_child_weight\": ContinuousParameter(1, 10),\n",
" \"max_depth\": IntegerParameter(1, 10),\n",
" \"alpha\": ContinuousParameter(0, 5),\n",
" \"eta\": ContinuousParameter(0, 1),\n",
" \"colsample_bytree\": ContinuousParameter(0, 1),\n",
" \"gamma\": ContinuousParameter(0, 10)\n",
" \n",
"}\n",
"\n",
"# set up the objective metric\n",
"objective = \"validation:auc\"\n",
"\n",
"# instantiate a HPO object\n",
"tuner = HyperparameterTuner(\n",
" estimator=estimator, # the SageMaker estimator object\n",
" hyperparameter_ranges=hp_ranges, # the range of hyperparameters\n",
" max_jobs=30, # total number of HPO jobs\n",
" max_parallel_jobs=3, # how many HPO jobs can run in parallel\n",
" strategy=\"Bayesian\", # the internal optimization strategy of HPO\n",
" objective_metric_name=objective, # the objective metric to be used for HPO\n",
" objective_type=\"Maximize\", # maximize or minimize the objective metric\n",
" base_tuning_job_name=\"from-idea-to-prod-hpo\",\n",
" early_stopping_type=\"Auto\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7aab138e-5629-4a9d-830a-efb425a375bb",
"metadata": {},
"source": [
"Now run the HPO job. It takes about 10 minutes to complete. \n",
"\n",
" 💡 Note, that the HPO job creates its own experiment to track each training job with a specific set of hyperparameters as a separate run.\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "9752f3de",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "612a815f-c9f9-4e8e-9afb-ffd3b98a0f4b",
"metadata": {},
"outputs": [],
"source": [
"tuner.fit(\n",
" {\"train\": s3_input_train, \"validation\": s3_input_validation},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0697a30f",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d60e7673-1d4f-4632-b5cc-c0745cb83260",
"metadata": {},
"outputs": [],
"source": [
"print(f\"HPO job status: {sm.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuner.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']}\")"
]
},
{
"cell_type": "markdown",
"id": "f8eb8c40",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc88733a-0711-4248-9acd-c7d78e4f1830",
"metadata": {},
"outputs": [],
"source": [
"hpo_predictor = tuner.deploy(\n",
" initial_instance_count=1,\n",
" instance_type=\"ml.m5.large\",\n",
" serializer=sagemaker.serializers.CSVSerializer(),\n",
" deserializer=sagemaker.deserializers.CSVDeserializer(),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "05c7ab6f",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3337f1c6-f211-4054-907f-ef0ee17683f2",
"metadata": {},
"outputs": [],
"source": [
"hpo_predictions = np.array(hpo_predictor.predict(test_x.values), dtype=float).squeeze()\n",
"print(hpo_predictions)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d77cc440-3828-4677-b19b-ede1edcb3903",
"metadata": {},
"outputs": [],
"source": [
"pd.crosstab(\n",
" index=test_y['y'].values,\n",
" columns=np.round(hpo_predictions), \n",
" rownames=['actuals'], \n",
" colnames=['predictions']\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5c06cc7b-81f9-42d6-9445-c35b0409f486",
"metadata": {},
"source": [
"There is no any material improvements for the model metrics. It can indicate, that the XGBoost model is already at it's limit. You might want to explore other model types to improve the prediction accuracy for this use case."
]
},
{
"cell_type": "markdown",
"id": "64fd8435-9c7e-4b4f-87b2-13ee830d19e3",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "f5493780-c392-4432-a34b-75f27ce984a5",
"metadata": {},
"source": [
"## Clean-up\n",
"To avoid charges, remove the hosted endpoint you created."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2de13080-5527-4de4-aacb-3eb50af0f92f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"predictor.delete_endpoint(delete_endpoint_config=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "040e95d7-b906-4a7c-90cb-4d73cefbf617",
"metadata": {},
"outputs": [],
"source": [
"# run if you created a tuned predictor after HPO\n",
"hpo_predictor.delete_endpoint(delete_endpoint_config=True)"
]
},
{
"cell_type": "markdown",
"id": "34f720e0-e8d5-4cb5-93a6-6a4f2dafd523",
"metadata": {},
"source": [
"## Continue with the step 3\n",
"open the step 3 [notebook](03-sagemaker-pipeline.ipynb)."
]
},
{
"cell_type": "markdown",
"id": "62b623d7-4b16-4c6f-96a8-ad92a77b1601",
"metadata": {},
"source": [
"## Further development ideas for your real-world projects\n",
"- Track, organize, and compare all your model training runs using [SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html)\n",
"- Use [Amazon SageMaker Data Wrangler](https://aws.amazon.com/sagemaker/data-wrangler/) for creating a no-code or low-code visual data processing and feature engineering flow. Refer to this hands-on tutorial: [Prepare Training Data for Machine Learning with Minimal Code](https://aws.amazon.com/getting-started/hands-on/machine-learning-tutorial-prepare-data-with-minimal-code/)\n",
"- Try no-code [SageMaker Canvas](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas.html) on your data to perform analysis and use automated ML to build models and generate predictions"
]
},
{
"cell_type": "markdown",
"id": "6ba1fc9d-7a52-4d2f-b51e-0f9422119389",
"metadata": {},
"source": [
"## Additional resources\n",
"- [Using Docker containers with SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html)\n",
"- [How to create and use a custom SageMaker container: SageMaker hands-on workshop](https://sagemaker-workshop.com/custom/containers.html)\n",
"- [Amazon SageMaker Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US)\n",
"- [Targeting Direct Marketing with Amazon SageMaker XGBoost](https://github.com/aws-samples/amazon-sagemaker-immersion-day/blob/master/processing_xgboost.ipynb)\n",
"- [Train a Machine Learning Model](https://aws.amazon.com/getting-started/hands-on/machine-learning-tutorial-train-a-model/)\n",
"- [Deploy a Machine Learning Model to a Real-Time Inference Endpoint](https://aws.amazon.com/getting-started/hands-on/machine-learning-tutorial-deploy-model-to-real-time-inference-endpoint/)\n",
"- [Amazon SageMaker 101 Workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/0c6b8a23-b837-4e0f-b2e2-4a3ffd7d645b/en-US)\n",
"- [Amazon SageMaker 101 Workshop code repository](https://github.com/aws-samples/sagemaker-101-workshop)\n",
"- [Amazon SageMaker with XGBoost and Hyperparameter Tuning for Direct Marketing predictions](https://github.com/aws-samples/sagemaker-101-workshop/blob/main/builtin_algorithm_hpo_tabular/SageMaker%20XGBoost%20HPO.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "2d7b222f-ee83-4c87-9019-cebcc7e56627",
"metadata": {},
"source": [
"# Shutdown kernel"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae0b1f02-44b8-4749-a5e2-ce415fff7f78",
"metadata": {},
"outputs": [],
"source": [
"%%html\n",
"\n",
"Shutting down your kernel for this notebook to release resources.
\n",
"\n",
" \n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "725f9d18-a550-46f6-9de8-9aeeed8171fd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"availableInstances": [
{
"_defaultOrder": 0,
"_isFastLaunch": true,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 4,
"name": "ml.t3.medium",
"vcpuNum": 2
},
{
"_defaultOrder": 1,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 8,
"name": "ml.t3.large",
"vcpuNum": 2
},
{
"_defaultOrder": 2,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 16,
"name": "ml.t3.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 3,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 32,
"name": "ml.t3.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 4,
"_isFastLaunch": true,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 8,
"name": "ml.m5.large",
"vcpuNum": 2
},
{
"_defaultOrder": 5,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 16,
"name": "ml.m5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 6,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 32,
"name": "ml.m5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 7,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 64,
"name": "ml.m5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 8,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 128,
"name": "ml.m5.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 9,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 192,
"name": "ml.m5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 10,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 256,
"name": "ml.m5.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 11,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 384,
"name": "ml.m5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 12,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 8,
"name": "ml.m5d.large",
"vcpuNum": 2
},
{
"_defaultOrder": 13,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 16,
"name": "ml.m5d.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 14,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 32,
"name": "ml.m5d.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 15,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 64,
"name": "ml.m5d.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 16,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 128,
"name": "ml.m5d.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 17,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 192,
"name": "ml.m5d.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 18,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 256,
"name": "ml.m5d.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 19,
"_isFastLaunch": false,
"category": "General purpose",
"gpuNum": 0,
"memoryGiB": 384,
"name": "ml.m5d.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 20,
"_isFastLaunch": true,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 4,
"name": "ml.c5.large",
"vcpuNum": 2
},
{
"_defaultOrder": 21,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 8,
"name": "ml.c5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 22,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 16,
"name": "ml.c5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 23,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 32,
"name": "ml.c5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 24,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 72,
"name": "ml.c5.9xlarge",
"vcpuNum": 36
},
{
"_defaultOrder": 25,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 96,
"name": "ml.c5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 26,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 144,
"name": "ml.c5.18xlarge",
"vcpuNum": 72
},
{
"_defaultOrder": 27,
"_isFastLaunch": false,
"category": "Compute optimized",
"gpuNum": 0,
"memoryGiB": 192,
"name": "ml.c5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 28,
"_isFastLaunch": true,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 16,
"name": "ml.g4dn.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 29,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 32,
"name": "ml.g4dn.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 30,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 64,
"name": "ml.g4dn.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 31,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 128,
"name": "ml.g4dn.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 32,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"memoryGiB": 192,
"name": "ml.g4dn.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 33,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 256,
"name": "ml.g4dn.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 34,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 61,
"name": "ml.p3.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 35,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"memoryGiB": 244,
"name": "ml.p3.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 36,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"memoryGiB": 488,
"name": "ml.p3.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 37,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"memoryGiB": 768,
"name": "ml.p3dn.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 38,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 16,
"name": "ml.r5.large",
"vcpuNum": 2
},
{
"_defaultOrder": 39,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 32,
"name": "ml.r5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 40,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 64,
"name": "ml.r5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 41,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 128,
"name": "ml.r5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 42,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 256,
"name": "ml.r5.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 43,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 384,
"name": "ml.r5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 44,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 512,
"name": "ml.r5.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 45,
"_isFastLaunch": false,
"category": "Memory Optimized",
"gpuNum": 0,
"memoryGiB": 768,
"name": "ml.r5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 46,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 16,
"name": "ml.g5.xlarge",
"vcpuNum": 4
},
{
"_defaultOrder": 47,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 32,
"name": "ml.g5.2xlarge",
"vcpuNum": 8
},
{
"_defaultOrder": 48,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 64,
"name": "ml.g5.4xlarge",
"vcpuNum": 16
},
{
"_defaultOrder": 49,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 128,
"name": "ml.g5.8xlarge",
"vcpuNum": 32
},
{
"_defaultOrder": 50,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 1,
"memoryGiB": 256,
"name": "ml.g5.16xlarge",
"vcpuNum": 64
},
{
"_defaultOrder": 51,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"memoryGiB": 192,
"name": "ml.g5.12xlarge",
"vcpuNum": 48
},
{
"_defaultOrder": 52,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 4,
"memoryGiB": 384,
"name": "ml.g5.24xlarge",
"vcpuNum": 96
},
{
"_defaultOrder": 53,
"_isFastLaunch": false,
"category": "Accelerated computing",
"gpuNum": 8,
"memoryGiB": 768,
"name": "ml.g5.48xlarge",
"vcpuNum": 192
}
],
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}