# A custom solution for bringing legacy Machine Learning code into Amazon SageMaker
In this repository, we present a deployement-ready solution which uses [SageMaker Processing] (https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) for data processing workloads [AWS Step Functions](https://aws.amazon.com/step-functions) to orchestrate workflow on [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
A complete description can be found in the corresponding [blog post](https://aws.amazon.com/blogs/machine-learning/bring-legacy-machine-learning-code-into-amazon-sagemaker-using-aws-step-functions/).
## Outline
- [Prerequisites](#Prerequisites)
- [Notebook Walktrough](#walkthrough)
- [Repo structure](#structure)
- [Input Documentation](#doc)
## Prerequisites
- SageMaker Studio instance
- Clone the provided GitHub repo into your SageMaker Studio
## Notebook Walkthrough (SUGGESTED)
You can familiarize with the resources using the tutorial `inferencecontainer/build_and_push.ipynb`.
# Repository file structure
The GitHub repo is organized into different folders that correspond to various stages in the machine learning lifecycle, facilitating easy navigation and management.
* legacycode_mlops
- inferencecontainer
- build_and_push.ipynb
- Dockerfile
- src
- predict.py
- requirements.txt
- preprocessing
- preprocess.py
- requirements.txt
- postprocessing
- postprocess.py
- requirements.txt
- data
- pre_processing_input.csv
- stepfunctions_config.json
- stepfunctions_input_parameters.json
- stepfunctions_permission_policy.json
## State Machines Input Documentation
Action flows defined using AWS Step Functions are called State Machine.
Each machine has parameters that can be defined at runtime (i.e. execution-specific) which are specified through an input json object. You can copy/paste them directly into the AWS Console.
__Request Syntax__
```
{
"PreProcessing":
{
"ProcessingJobName": "sm-inf-job-0211-1156",
"InputUri":
{
"S3Uri": "s3:///legacycode/data/preproc/input/"
},
"OutputUri":
{
"S3Uri": "s3:///legacycode/data/predict/input/"
},
"CodeUri":
{
"S3Uri": "s3:///legacycode/scripts/"
},
"InstanceType": "ml.m5.xlarge",
"VolumeSizeInGB": 20,
"MaxRuntimeInSeconds": 3600,
"ImageUri": ".dkr.ecr..amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3",
"RoleArn": "arn:aws:iam:::role/service-role/"
},
"Inference":
{
"ProcessingJobName": "sm-inf-job-0211-1157",
"InputUri":
{
"S3Uri": "s3:///legacycode/data/predict/input/"
},
"OutputUri":
{
"S3Uri": "s3://legacycode/data/predict/output/"
},
"InstanceType": "ml.m5.xlarge",
"VolumeSizeInGB": 20,
"MaxRuntimeInSeconds": 3600,
"ImageUri": ".dkr.ecr..amazonaws.com/legacycode:1.0",
"RoleArn": "arn:aws:iam::"
},
"PostProcessing":
{
"ProcessingJobName": "sm-inf-job-0211-1158",
"InputUri":
{
"S3Uri": "s3:///legacycode/data/predict/output/"
},
"OutputUri":
{
"S3Uri": "s3:///legacycode/data/postproc/output/"
},
"CodeUri":
{
"S3Uri": "s3:///legacycode/scripts/"
},
"InstanceType": "ml.m5.xlarge",
"VolumeSizeInGB": 20,
"MaxRuntimeInSeconds": 3600,
"ImageUri": ".dkr.ecr..amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3",
"RoleArn": "arn:aws:iam:::role/service-role/"
}
}
```
__Parameters__
- __input_uri- __ - The S3 URI for the input files.
- __output_uri- __ - The S3 URI for the output files.
- __code_uri- __ - The S3 URI for script files.
- __custom_image_uri- __ - The container URI for the custom container you have built.
- __scikit_image_uri- __ - The container URI for the pre-built Sci-kit learn framework.
- __role- __ - The execution role to run the job.
- __instance_type- __ - The instance type you need to use to run the container.
- __volume_size- __ - The storage volume size you require for the container.
- __max_runtime- __ - The maximum runtime for the container, with a default value of 1 hour.
## Enjoy!