{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Regression with Amazon SageMaker Autopilot (Parquet input)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "---" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "This is the accompanying notebook for the blog post [Run AutoML experiments with large parquet datasets using Amazon SageMaker Autopilot](https://aws.amazon.com/blogs/machine-learning/run-automl-experiments-with-large-parquet-datasets-using-amazon-sagemaker-autopilot/). The example here is almost the same as [Regression with Amazon SageMaker XGBoost algorithm (Parquet)](../introduction_to_amazon_algorithms/xgboost_abalone/xgboost_parquet_input_training.ipynb).\n", "\n", "This notebook tackles the exact same problem with the same solution, but has been modified for a Parquet input to be trained in SageMaker Autopilot. The original notebook provides details of dataset and the machine learning use-case.\n", "\n", "This notebook was tested in Amazon SageMaker Studio on a ml.t3.medium instance with Python 3 (Data Science) kernel." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! pip install --upgrade boto3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "import re\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "\n", "role = get_execution_role()\n", "region = boto3.Session().region_name\n", "\n", "# S3 bucket for saving code and model artifacts.\n", "# Feel free to specify a different bucket here if you wish.\n", "bucket = sagemaker.Session().default_bucket()\n", "prefix = \"sagemaker/DEMO-automl-parquet\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We will use [PyArrow](https://arrow.apache.org/docs/python/) library to store the Abalone dataset in the Parquet format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pyarrow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "s3 = boto3.client(\"s3\")\n", "# Download the dataset and load into a pandas dataframe\n", "FILE_NAME = \"abalone.csv\"\n", "s3.download_file(\n", " f\"sagemaker-example-files-prod-{region}\", f\"datasets/tabular/uci_abalone/abalone.csv\", FILE_NAME\n", ")\n", "\n", "feature_names = [\n", " \"Sex\",\n", " \"Length\",\n", " \"Diameter\",\n", " \"Height\",\n", " \"Whole weight\",\n", " \"Shucked weight\",\n", " \"Viscera weight\",\n", " \"Shell weight\",\n", " \"Rings\",\n", "]\n", "data = pd.read_csv(FILE_NAME, header=None, names=feature_names)\n", "\n", "data.to_parquet(\"abalone.parquet\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "sagemaker.Session().upload_data(\"abalone.parquet\", bucket=bucket, key_prefix=prefix)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "After setting the parameters, we kick off training, and poll for status until training is completed, which in this example, takes under 1 hour." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "import time\n", "from time import gmtime, strftime\n", "\n", "job_name = \"autopilot-parquet-\" + strftime(\"%m-%d-%H-%M\", gmtime())\n", "print(\"AutoML job:\", job_name)\n", "\n", "create_auto_ml_job_params = {\n", " \"AutoMLJobConfig\": {\n", " \"CompletionCriteria\": {\n", " \"MaxCandidates\": 50,\n", " }\n", " },\n", " \"AutoMLJobName\": job_name,\n", " \"InputDataConfig\": [\n", " {\n", " \"ContentType\": \"x-application/vnd.amazon+parquet\",\n", " \"CompressionType\": \"None\",\n", " \"DataSource\": {\n", " \"S3DataSource\": {\n", " \"S3DataType\": \"S3Prefix\",\n", " \"S3Uri\": f\"s3://{bucket}/{prefix}/abalone.parquet\",\n", " }\n", " },\n", " \"TargetAttributeName\": \"Rings\",\n", " }\n", " ],\n", " \"OutputDataConfig\": {\"S3OutputPath\": f\"s3://{bucket}/{prefix}/output\"},\n", " \"RoleArn\": role,\n", "}\n", "\n", "client = boto3.client(\"sagemaker\", region_name=region)\n", "client.create_auto_ml_job(**create_auto_ml_job_params)\n", "\n", "response = client.describe_auto_ml_job(AutoMLJobName=job_name)\n", "status = response[\"AutoMLJobStatus\"]\n", "secondary_status = response[\"AutoMLJobSecondaryStatus\"]\n", "print(f\"{status} - {secondary_status}\")\n", "\n", "while status != \"Completed\" and secondary_status != \"Failed\":\n", " time.sleep(60)\n", " response = client.describe_auto_ml_job(AutoMLJobName=job_name)\n", " status = response[\"AutoMLJobStatus\"]\n", " secondary_status = response[\"AutoMLJobSecondaryStatus\"]\n", " print(f\"{status} - {secondary_status}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Please refer to other Autopilot example notebooks such as [Direct Marketing with Amazon SageMaker Autopilot](sagemaker_autopilot_direct_marketing.ipynb) and [Customer Churn Prediction with Amazon SageMaker Autopilot](autopilot_customer_churn.ipynb) to see how to investigate details of each training, deploy the best candidate and run inference." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/autopilot|sagemaker_autopilot_abalone_parquet_input.ipynb)\n" ] } ], "metadata": { "anaconda-cloud": {}, "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }