# Data Scientist - Raw Data
***
*This notebook should work well with the Python 3 (Data Science) kernel in SageMaker Studio*
***

For the demonstration workflow, you'll download synthetically generated data and upload it to the studio default S3 bucket.

#### Environment setup
Import libraries, setup logging, and define few variables. 

In [None]:
import logging
import requests
import sagemaker

from pathlib import Path
from urllib import parse

Set up a logger

In [None]:
logger = logging.getLogger("__name__")
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

Define SageMaker and Boto3 sessions and few additional parameters

In [None]:
sagemaker_session = sagemaker.Session()
sagemaker_client = sagemaker_session.sagemaker_client

boto_session = sagemaker_session.boto_session
region = sagemaker_session.boto_region_name
role = sagemaker.get_execution_role()

s3_uploader = sagemaker.s3.S3Uploader

bucket = sagemaker_session.default_bucket()
prefix = "mlops-demo"

## Data Download
The inputs for building our model and workflow are two tables of insurance data: a claims table and a customers table.

In [None]:
base_url = "https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/end_to_end/fraud_detection/data/"
file_list = ["claims.csv", "customers.csv"]
feature_eng_base_path = Path("feature_engineering")

In [None]:
local_path = Path("data")
local_path.mkdir(exist_ok=True)
for file_url in file_list:
 file_url = base_url + file_url
 parsed_url = parse.urlparse(file_url)
 file_name = Path(parsed_url.path).name
 file_path = local_path / file_name
 with file_path.open("wb") as f, requests.get(file_url, stream=True) as r:
 for chunk in r.iter_content():
 f.write(chunk)
 logger.info(f"Retrieved {file_url}")

## Data Upload to S3

In [None]:
data_uri_prefix = s3_uploader.upload(local_path.as_posix(), f"s3://{bucket}/{prefix}")

In [None]:
claims_uri = data_uri_prefix + "/claims.csv"
customers_uri = data_uri_prefix + "/customers.csv"

In [None]:
%store claims_uri
%store customers_uri