# Configuration Management

This notebook is designed to be run with `Python 3 (Data Science)` kernel.

## Input & Output Configuration

In the following notebooks, we will use an S3 bucket to store raw data, processed data, feataures and trained models. Therefore we retrieve the configuration, which we build and stored earlier in the Parameter Store.

Includes

In [None]:
import os
import boto3

session = boto3.Session()
ssm = session.client('ssm')

In [None]:
bucket = ssm.get_parameter(Name="/aik/data-bucket")["Parameter"]["Value"]
bucket

Now as we have our base bucket, we can prepare the various prefixes which will be used for data processing and the model training later on.

In [None]:
# download url for the example data set
download_url = "https://www.kaggle.com/lastsummer/ipinyou/download"

# destination where we store the raw data
raw_data = "s3://" + bucket + "/raw/ipinyou-data"
# taking a subset of the rawdata to speed up processing and training during development
bid_source = "s3://" + bucket + "/raw/ipinyou-data/training1st/bid.20130311.txt.bz2"
imp_source = "s3://" + bucket + "/raw/ipinyou-data/training1st/imp.20130311.txt.bz2"

# output destinations for the data processing 
output_train = "s3://" + bucket + "/processed/sample/train"
output_test = "s3://" + bucket + "/processed/sample/test"
output_verify = "s3://" + bucket + "/processed/sample/valid"
output_transformed= "s3://" + bucket + "/processed/sample/transformed"
pipelineModelArtifactPath = "s3://" + bucket + "/pipeline-model/model.zip"
inference_data = "s3://" + bucket + "/pipeline-model/inference-data/"
inference_schema = "s3://" + bucket + "/pipeline-model/pipeline-schema.json"
binary_model = "s3://" + bucket + "/binary-model/xgboost.bin"

## Store Configuration data for consumption in following notebooks.

## Store Parameter in the System Manger ParameterStore

A good alternative way of storing parameters is the AWS Systems Manager Parameter Store

In [None]:
ssm.put_parameter(Name="/aik/download_url", Value=download_url, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/raw_data", Value=raw_data, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/bid_source", Value=bid_source, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/imp_source", Value=imp_source, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/output_train", Value=output_train, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/output_test", Value=output_test, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/output_verify", Value=output_verify, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/output_transformed", Value= output_transformed, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/pipelineModelArtifactPath", Value= pipelineModelArtifactPath, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/inference_data", Value=inference_data, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/xgboost/path", Value=binary_model, Type="String", Overwrite=True)
ssm.put_parameter(Name="/aik/pipelineModelArtifactSchemaPath", Value=inference_schema, Type="String", Overwrite=True)

## Read Parameter from the System Manager Parameter Store

In [None]:
bucket = ssm.get_parameter(Name="/aik/data-bucket")["Parameter"]["Value"]
download_url = ssm.get_parameter(Name="/aik/download_url")["Parameter"]["Value"]
raw_data = ssm.get_parameter(Name="/aik/raw_data")["Parameter"]["Value"]
bid_source = ssm.get_parameter(Name="/aik/bid_source")["Parameter"]["Value"]
imp_source = ssm.get_parameter(Name="/aik/bid_source")["Parameter"]["Value"]
output_train = ssm.get_parameter(Name="/aik/output_train")["Parameter"]["Value"]
output_test = ssm.get_parameter(Name="/aik/output_test")["Parameter"]["Value"]
output_verify = ssm.get_parameter(Name="/aik/output_verify")["Parameter"]["Value"] 
output_transformed = ssm.get_parameter(Name="/aik/output_transformed")["Parameter"]["Value"] 
pipelineModelArtifactPath = ssm.get_parameter(Name="/aik/pipelineModelArtifactPath")["Parameter"]["Value"] 
inference_data = ssm.get_parameter(Name="/aik/inference_data")["Parameter"]["Value"]
binary_model = ssm.get_parameter(Name="/aik/xgboost/path")["Parameter"]["Value"]
inference_schema= ssm.get_parameter(Name="/aik/pipelineModelArtifactSchemaPath")["Parameter"]["Value"]

## Print current configuration

In [None]:
print(f'bucket={bucket}')
print(f'download_url={download_url}')
print(f'raw_data={raw_data}')
print(f'bid_source={bid_source}')
print(f'imp_source={imp_source}')
print(f'output_train={output_train}')
print(f'output_verify={output_verify}')
print(f'output_test={output_test}')
print(f'output_transformed={output_transformed}')
print(f'pipelineModelArtifactPath={pipelineModelArtifactPath}')