# Using SageMaker Studio Lab with AWS Resources

[](https://studiolab.sagemaker.aws/import/github/aws/studio-lab-examples/blob/main/connect-to-aws/Access_AWS_from_Studio_Lab.ipynb)

Following guidance here
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html

### Step 0. Install AWS CLI, boto3, and configure with your AWS credentials. 
 Also create and paste in your SageMaker execution role. 

In [1]:
%pip install boto3

In [2]:
%pip install awscli

In [3]:
!mkdir ~/.aws

---
# Exercise Caution on Using AWS Credentials
The next step should only be undertaken by professionals who are already comfortable using AWS access and secret keys. These credentials are similar to the keys to a car - if someone takes them inadvertenly, they can steal your vehicle. While there are additional AWS permissions you can apply, the basic concept still stands. Under no circumstances should you share these resources publicly. 

Please refer here for getting started with your AWS credentials.
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html 

That being said, if you are handling your keys carefully, you can in fact access your AWS account from Studio Lab. We'll walk through that here.

In [4]:
%%writefile ~/.aws/credentials

[default]
aws_access_key_id = < paste your access key here, run this cell, then delete the cell >
aws_secret_access_key = < paste your secret key here, run this cell, then delete the cell > 

In [13]:
%%writefile ~/.aws/config

[default]
region=us-east-1

In [9]:
!pip install sagemaker

If you are already used to using SageMaker within your own AWS account, please copy and paste the arn for your execution role below. If you are new to thise, follow the steps to create one here.

https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role-sagemaker-notebook.html

Please note, in order to complete this you will need to have already created this SageMaker IAM Execution role.

In [10]:
import sagemaker

# create a sagemaker execution role via the AWS SageMaker console, then paste in the arn here
role = ' < paste your execution role here > '

### Step 1. Copy your local data to your preferred S3 bucket, or vice versa 
This notebook will assume you already have access to a training dataset relevant for language translation. If you don't, please step through this notebook to create the relevant train files locally.
- https://github.com/aws/studio-lab-examples/blob/main/natural-language-processing/NLP_Disaster_Recovery_Translation.ipynb 

We'll demonstrate copying that data up to your AWS account via the cli here, but you can also upload through the UI, or use boto3. Many good options here.

In [12]:
bucket_name = '<paste your bucket name here >'
train_file_name = 'train.json'
s3_data_path = 's3://{}/data/{}'.format(bucket_name, train_file_name)

In [11]:
!aws s3 sync ./notebooks/data/ {s3_data_path}

upload: notebooks/data/train.json to s3://hf-translation-bucket/data/train.json/train.json


### Step 2. Point to the Hugging Face containers and train a model
We strongly recommend using the Hugging Face models webpage to generate the configuration code for your desired model or resource. You can do so here:
- https://huggingface.co/models

AWS has prebuilt deep learning containers for 5 software frameworks, including TensorFlow, PyTorch, MXNet, Hugging Face, and AutoGloun. You can extend these base images, or simply use the script mode construct as below.
- https://github.com/aws/deep-learning-containers 

To learn more about script mode on SageMaker, check out our documentation here: 
- https://sagemaker.readthedocs.io/en/stable/frameworks/index.html

In [12]:
import sagemaker
from sagemaker.huggingface import HuggingFace

# gets role for executing training job
hyperparameters = {
	'model_name_or_path':'t5-small',
	'output_dir':'/opt/ml/model',
 'train_file': '/opt/ml/input/data/train/{}'.format(train_file_name),
 'do_train': True,
 'source_lang': 'en',
 'target_lang': 'es',
 'source_prefix':"translate English to Spanish: " 
	# add your remaining hyperparameters
	# more info here https://github.com/huggingface/transformers/tree/v4.6.1/examples/pytorch/seq2seq
}

# git configuration to download our fine-tuning script
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.6.1'}

# creates Hugging Face estimator
huggingface_estimator = HuggingFace(
 entry_point='run_translation.py',
 source_dir='./examples/pytorch/translation',
 instance_type='ml.p3.2xlarge',
 instance_count=1,
 role=role,
 git_config=git_config,
 transformers_version='4.6.1',
 pytorch_version='1.7.1',
 py_version='py36',
 hyperparameters = hyperparameters
)

# starting the train job
huggingface_estimator.fit({'train':s3_data_path}, wait=True)

---
# Evaluate your job in the cloud and clean up
That's a wrap! Please make sure that before you share this notebook with anyone you remove your access and secret keys from the cell above. You can also delete the core files themselves, but that will disable you from accessing your AWS account locally going forward.