{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Amazon Lookout for Vision Lab\n", "\n", "To help you learn about creating a model, Amazon Lookout for Vision provides example images of circuit boards (circuit_board) that you can use. These images are taken from https://docs.aws.amazon.com/lookout-for-vision/latest/developer-guide/su-prepare-example-images.html.\n", "\n", "### Environmental variables\n", "\n", "In a very first step we want to define the two global variables needed for this notebook:\n", "\n", "- bucket: the S3 bucket that you will create and then use as your source for Amazon Lookout for Vision\n", " - Note: Please read the comments carefully. Depending on your region you need to uncomment the correct command\n", "- project: the project name you want to use in Amazon Lookout for Vision" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "\n", "bucket = \"lfv-s3-bucket--MMDDYY\"\n", "project = \"circuitproject\"\n", "os.environ[\"BUCKET\"] = bucket\n", "os.environ[\"REGION\"] = boto3.session.Session().region_name\n", "\n", "client = boto3.client('lookoutvision')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can check your region here with:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Check your region:\n", "print(boto3.session.Session().region_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Depending on your region follow the instructions of the next cell:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "##If you are not using existing S3 bucket, create your S3 bucket:\n", "\n", "if boto3.session.Session().region_name=='us-east-1':\n", " !aws s3api create-bucket --bucket $BUCKET\n", "else:\n", " !aws s3api create-bucket --bucket $BUCKET --create-bucket-configuration LocationConstraint=$REGION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Image Preparation and EDA\n", "\n", "In Amazon Lookout for Vision - see also\n", "- https://aws.amazon.com/lookout-for-vision/ and\n", "- https://aws.amazon.com/blogs/aws/amazon-lookout-for-vision-new-machine-learning-service-that-simplifies-defect-detection-for-manufacturing/\n", "if you already have pre-labeled images available, as it is the case in this example, you can already establish a folder structure that lets you define training and validation. Further, images are labeled for Amazon Lookout via the corresponding folder (normal=good, anomaly=bad).\n", "\n", "We will import the sample images provided by AWS Lookout of Vision. If you're importing your own images, you will prepare them at this stage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate the *manifest* files\n", "\n", "You might be familiar with the manifest files if you ever used Amazon SageMaker Ground Truth. If you are not don't worry about that section too much.\n", "\n", "If you are still interested in what's happening, you can continue reading:\n", "\n", "Each dataset training/ as well as validation/ needs a manifest file. This file is used by Amazon Lookout for Vision to determine where to look for the images. The manifest follows a fixed structure. Most importantly are the keys (it's JSON formatted) *source-ref* this is the location for each file, *auto-label* the value for each label (0=bad, 1=good), *folder* which indicates whether Amazon Lookout is using training or validation and *creation-date* as this let's you know when an image was put in place. All other fields are pre-set for you.\n", "\n", "Each manifest file itself contains N JSON objects, where N is the number of images that are used in this dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Datetime for datetime generation and json to dump the JSON object\n", "# to the corresponding files:\n", "from datetime import datetime\n", "import json\n", "\n", "# Current date and time in manifest file format:\n", "now = datetime.now()\n", "dttm = now.strftime(\"%Y-%m-%dT%H:%M:%S.%f\")\n", "\n", "# The two datasets used: train and test\n", "datasets = [\"train\", \"test\"]\n", "\n", "# For each dataset...\n", "for ds in datasets:\n", " # ...list the folder available (normal or anomaly).\n", " #print(ds)\n", " folders = os.listdir(\"./circuitboard/{}\".format(ds))\n", " # Then open the manifest file for this dataset...\n", " with open(\"{}.manifest\".format(ds), \"w\") as f:\n", " for folder in folders:\n", " filecount=0\n", " #print(folder)\n", " # ...and iterate through both folders by first listing\n", " # the corresponding files and setting the appropriate label\n", " # (as noted above: 1 = good, 0 = bad):\n", " files = os.listdir(\"./circuitboard/{}/{}\".format(ds, folder))\n", " label = 1\n", " if folder == \"anomaly\":\n", " label = 0\n", " # For each file in the folder...\n", " for file in files:\n", " filecount+=1\n", " #print(filecount)\n", " # Uncomment the following two lines to use the entire dataset\n", " if filecount>20:\n", " break\n", " # ...generate a manifest JSON object and save it to the manifest\n", " # file. Don't forget to add '/n' to generate a new line:\n", " manifest = {\n", " \"source-ref\": \"s3://{}/{}/{}/{}/{}\".format(bucket,project, ds, folder, file),\n", " \"auto-label\": label,\n", " \"auto-label-metadata\": {\n", " \"confidence\": 1,\n", " \"job-name\": \"labeling-job/auto-label\",\n", " \"class-name\": folder,\n", " \"human-annotated\": \"yes\",\n", " \"creation-date\": dttm,\n", " \"type\": \"groundtruth/image-classification\"\n", " }\n", " }\n", " f.write(json.dumps(manifest)+\"\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Upload manifest files and images to S3\n", "\n", "Now it's time to upload all the images and the manifest files:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Upload manifest files to S3 bucket:\n", "!aws s3 cp train.manifest s3://{bucket}/{project}/train.manifest\n", "!aws s3 cp test.manifest s3://{bucket}/{project}/test.manifest" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Upload images to S3 bucket:\n", "!aws s3 cp circuitboard/train/normal s3://{bucket}/{project}/train/normal --recursive\n", "!aws s3 cp circuitboard/train/anomaly s3://{bucket}/{project}/train/anomaly --recursive\n", "\n", "!aws s3 cp circuitboard/test/normal s3://{bucket}/{project}/test/normal --recursive\n", "!aws s3 cp circuitboard/test/anomaly s3://{bucket}/{project}/test/anomaly --recursive" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Amazon Lookout for Vision\n", "\n", "We are almost done. You have a couple of options on how to create your Amazon Lookout project (console, CLI or boto3). We chose boto3 SDK in this example. We highly recommend to check out the console, too. It's so simple to generate a project and let a model be trained. This is what we should show to our customers, too!\n", "\n", "The steps we take with SDK are:\n", "\n", "1. Create a project (the name as been set right at the beginning)\n", "2. Tell your project where to find your training dataset. This is done via the manifest file for training.\n", "3. Tell your project where to find your test dataset. This is done via the manifest file for test.\n", " - Note: This step is optional. In general all 'test' related code, etc. is optional. Amazon Lookout for Vision will also work with 'training' dataset only. We chose to use both as training and testing is a common (best) practice when training AI/ML models. And we should always let our customer know this to help them get to the next level.\n", "4. Create a model. This command will trigger the model training and validation.\n", "\n", "**Note**: Training a model can (will) take a few hours as it uses Deep Learning in the background. Once your model is trained, you can continue with this notebook to make predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating Project" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Creating project\n", "print('Creating project:' + project)\n", "response=client.create_project(ProjectName=project)\n", "print('project ARN: ' + response['ProjectMetadata']['ProjectArn'])\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating Training Dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Creating training dataset\n", "dataset_type ='train'\n", "manifest_file = project+'/train.manifest'\n", "\n", "print('Creating dataset...')\n", "dataset=json.loads('{ \"GroundTruthManifest\": { \"S3Object\": { \"Bucket\": \"' + bucket + '\", \"Key\": \"'+ manifest_file + '\" } } }')\n", "\n", "response=client.create_dataset(ProjectName=project, DatasetType=dataset_type, DatasetSource=dataset)\n", "print('Dataset Status: ' + response['DatasetMetadata']['Status'])\n", "print('Dataset Status Message: ' + response['DatasetMetadata']['StatusMessage'])\n", "print('Dataset Type: ' + response['DatasetMetadata']['DatasetType'])\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating Test Dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Creating test dataset\n", "dataset_type ='test'\n", "manifest_file = project+'/test.manifest'\n", "\n", "print('Creating dataset...')\n", "dataset=json.loads('{ \"GroundTruthManifest\": { \"S3Object\": { \"Bucket\": \"' + bucket + '\", \"Key\": \"'+ manifest_file + '\" } } }')\n", "\n", "response=client.create_dataset(ProjectName=project, DatasetType=dataset_type, DatasetSource=dataset)\n", "print('Dataset Status: ' + response['DatasetMetadata']['Status'])\n", "print('Dataset Status Message: ' + response['DatasetMetadata']['StatusMessage'])\n", "print('Dataset Type: ' + response['DatasetMetadata']['DatasetType'])\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating/training Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Creating/training model\n", "output_bucket = bucket\n", "output_folder = project+'/model/'\n", "\n", " \n", "print('Creating model...')\n", "output_config=dataset=json.loads('{ \"S3Location\": { \"Bucket\": \"' + output_bucket + '\", \"Prefix\": \"'+ output_folder + '\" } } ')\n", "\n", "response=client.create_model(ProjectName=project, OutputConfig=output_config)\n", "print('ARN: ' + response['ModelMetadata']['ModelArn'])\n", "print('Version: ' + response['ModelMetadata']['ModelVersion'])\n", "print('Status: ' + response['ModelMetadata']['Status'])\n", "print('Message: ' + response['ModelMetadata']['StatusMessage'])\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Deployment\n", "\n", "Getting the model in an operating stage is as easy as telling it to \"start\". This process also takes a few minutes. So, please be patient. You can again check in the console (or via CLI) the status of the model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wait for the model training to complete" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "while client.describe_model(ProjectName=project,ModelVersion='1')['ModelDescription']['Status']!='TRAINED':\n", " print('.',end='');time.sleep(5);\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Hosting the trained model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_version='1'\n", "min_inference_units=1 \n", " \n", "print('Starting model version ' + model_version + ' for project ' + project )\n", "response=client.start_model(ProjectName=project,\n", " ModelVersion=model_version,\n", " MinInferenceUnits=min_inference_units)\n", "print('Status: ' + response['Status'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wait for model hosting to complete" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "while client.describe_model(ProjectName=project,ModelVersion='1')['ModelDescription']['Status']!='HOSTED':\n", " print('.',end='');time.sleep(5);\n", "print('Done!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make Predictions\n", "\n", "\n", "Making predictions via boto3 SDK requires the project name, model version, content type and a sample images. We are using images locally from the SageMaker notebook instance:\n", "\n", "If you would like to use GUI based solution to make predictions, refer to this demo - https://github.com/aws-samples/amazon-lookout-for-vision-demo" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Picking an anamolous image from the extra images\n", "photo='circuitboard/extra_images/extra_images-anomaly_3.jpg'\n", "model_version='1'\n", " \n", "with open(photo, 'rb') as image:\n", " response = client.detect_anomalies(ProjectName=project, \n", " ContentType='image/jpeg',\n", " Body=image.read(),\n", " ModelVersion=model_version)\n", "print ('Anomalous?: ' + str(response['DetectAnomalyResult']['IsAnomalous']))\n", "print ('Confidence: ' + str(response['DetectAnomalyResult']['Confidence']))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Picking a normal image from the extra images\n", "photo='circuitboard/extra_images/extra_images-normal_1.jpg'\n", "model_version='1'\n", " \n", "with open(photo, 'rb') as image:\n", " response=client.detect_anomalies(ProjectName=project, \n", " ContentType='image/jpeg',\n", " Body=image.read(),\n", " ModelVersion=model_version)\n", "print ('Anomalous?: ' + str(response['DetectAnomalyResult']['IsAnomalous']))\n", "print ('Confidence: ' + str(response['DetectAnomalyResult']['Confidence']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# BE FRUGAL, stop the model\n", "\n", "If you don't need your model anymore please stop it to save costs!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#If you are not using the model, stop to save costs!\n", "model_version='1'\n", "\n", "print('Stopping model version ' + model_version + ' for project ' + project )\n", "response=client.stop_model(ProjectName=project,\n", " ModelVersion=model_version)\n", "print('Status: ' + response['Status'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }