{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Training & Deploying a XGBoost model for Predicting Machine Failures(Predictive Maintainance)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook should be run after the Data Pre-Processing.ipynb has been run, to generate the curated train/test datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we train a ML model to predict whether the machine failed or not based on system readings. We will train a XGBoost model, using Amazon SageMaker's built in algorithm. XGBoost can provide good results for multiple types of ML problems including classification, even when training samples are limited." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "import numpy as np\n", "import pandas as pd\n", "import os\n", "import json\n", "import boto3\n", "import matplotlib.pyplot as plt\n", "\n", "sagemaker_session = sagemaker.Session()\n", "boto_session = boto3.session.Session()\n", "sm_client = boto_session.client(\"sagemaker\")\n", "sm_runtime = boto_session.client(\"sagemaker-runtime\")\n", "region = boto_session.region_name\n", "account = boto3.client('sts').get_caller_identity().get('Account')\n", "role = sagemaker.get_execution_role()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## IMPORTANT -\n", "#### Replace <> below with the bucket name created by the CloudFormation template. \n", "#### The bucket name is created with the format <-stack name->-<-eventsbucket->-<-############->\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_client = boto3.client('s3')\n", "response = s3_client.list_buckets()\n", "for bucketname in response['Buckets']:\n", " if \"eventsbucket\" in bucketname[\"Name\"]:\n", " print(bucketname[\"Name\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket = '<>' \n", "prefix = 'xgb-data'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up Paths and Directories" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Path to upload the trained model\n", "xgb_upload_location = os.path.join('s3://{}/{}'.format(bucket, 'xgb-model'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieve the XGBoost container image from ECR\n", "region = sagemaker_session.boto_region_name\n", "container= sagemaker.image_uris.retrieve('xgboost', region, '0.90-1')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Upload the training and test data to S3\n", "train_channel = prefix + '/train'\n", "validation_channel = prefix + '/validation'\n", "\n", "sagemaker_session.upload_data(path='training_data', bucket=bucket, key_prefix=train_channel)\n", "sagemaker_session.upload_data(path='test_data', bucket=bucket, key_prefix=validation_channel)\n", "\n", "s3_train_channel = sagemaker.inputs.TrainingInput('s3://{}/{}'.format(bucket, train_channel), content_type ='csv')\n", "s3_valid_channel = sagemaker.inputs.TrainingInput('s3://{}/{}'.format(bucket, validation_channel), content_type ='csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "