{ "cells": [ { "cell_type": "markdown", "id": "fa8a43b3-d3ff-46df-8bb5-463c0357f29b", "metadata": {}, "source": [ "# Using SageMaker Studio Lab with AWS Resources\n", "\n", "[](https://studiolab.sagemaker.aws/import/github/aws/studio-lab-examples/blob/main/connect-to-aws/Access_AWS_from_Studio_Lab.ipynb)\n", "\n", "Following guidance here\n", "https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html" ] }, { "cell_type": "markdown", "id": "75c10a5f-5cf1-4217-aff4-a520e7f10b5c", "metadata": {}, "source": [ "### Step 0. Install AWS CLI, boto3, and configure with your AWS credentials. \n Also create and paste in your SageMaker execution role. " ] }, { "cell_type": "code", "execution_count": 1, "id": "e5164c77-3696-48ef-9c91-72918b44b29e", "metadata": { "tags": [] }, "outputs": [], "source": [ "%pip install boto3" ] }, { "cell_type": "code", "execution_count": 2, "id": "e63e9d47-b46c-4669-98c9-425e65b4b785", "metadata": { "tags": [] }, "outputs": [], "source": [ "%pip install awscli" ] }, { "cell_type": "code", "execution_count": 3, "id": "da0ef255-285a-4637-b2ea-99f4da3319a3", "metadata": {}, "outputs": [], "source": [ "!mkdir ~/.aws" ] }, { "cell_type": "markdown", "id": "9a57d662-5705-4c2b-a8ca-9cd7a4494fc9", "metadata": {}, "source": [ "---\n", "# Exercise Caution on Using AWS Credentials\n", "The next step should only be undertaken by professionals who are already comfortable using AWS access and secret keys. These credentials are similar to the keys to a car - if someone takes them inadvertenly, they can steal your vehicle. While there are additional AWS permissions you can apply, the basic concept still stands. Under no circumstances should you share these resources publicly. \n", "\n", "Please refer here for getting started with your AWS credentials.\n", "https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html \n", "\n", "That being said, if you are handling your keys carefully, you can in fact access your AWS account from Studio Lab. We'll walk through that here." ] }, { "cell_type": "code", "execution_count": 4, "id": "9e9b9094-090e-4465-9ef5-6b70a0bb03c5", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%writefile ~/.aws/credentials\n", "\n", "[default]\n", "aws_access_key_id = < paste your access key here, run this cell, then delete the cell >\n", "aws_secret_access_key = < paste your secret key here, run this cell, then delete the cell > " ] }, { "cell_type": "code", "execution_count": 13, "id": "08a81657-8b85-4d46-9f92-9956079595eb", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%writefile ~/.aws/config\n", "\n", "[default]\n", "region=us-east-1" ] }, { "cell_type": "code", "execution_count": 9, "id": "4c2adb43-37a0-4de9-a226-53f845aa566e", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install sagemaker" ] }, { "cell_type": "markdown", "id": "4370662b-6253-4aab-90b7-8ec66f3b88a8", "metadata": {}, "source": [ "If you are already used to using SageMaker within your own AWS account, please copy and paste the arn for your execution role below. If you are new to thise, follow the steps to create one here.\n", "\n", "https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role-sagemaker-notebook.html\n", "\n", "Please note, in order to complete this you will need to have already created this SageMaker IAM Execution role." ] }, { "cell_type": "code", "execution_count": 10, "id": "ab6826b5-9291-4eb3-911a-891af1964411", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "# create a sagemaker execution role via the AWS SageMaker console, then paste in the arn here\n", "role = ' < paste your execution role here > '" ] }, { "cell_type": "markdown", "id": "1843a796-5bcf-4f49-be8f-42a28f614599", "metadata": {}, "source": [ "### Step 1. Copy your local data to your preferred S3 bucket, or vice versa \n", "This notebook will assume you already have access to a training dataset relevant for language translation. If you don't, please step through this notebook to create the relevant train files locally.\n", "- https://github.com/aws/studio-lab-examples/blob/main/natural-language-processing/NLP_Disaster_Recovery_Translation.ipynb \n", "\n", "We'll demonstrate copying that data up to your AWS account via the cli here, but you can also upload through the UI, or use boto3. Many good options here." ] }, { "cell_type": "code", "execution_count": 12, "id": "cb8b9500-b7e3-4d6e-b4f1-f5351fa203d3", "metadata": {}, "outputs": [], "source": [ "bucket_name = '<paste your bucket name here >'\n", "train_file_name = 'train.json'\n", "s3_data_path = 's3://{}/data/{}'.format(bucket_name, train_file_name)" ] }, { "cell_type": "code", "execution_count": 11, "id": "525767bf-49ed-4cfc-8afc-e1c5fff8d7ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "upload: notebooks/data/train.json to s3://hf-translation-bucket/data/train.json/train.json\n" ] } ], "source": [ "!aws s3 sync ./notebooks/data/ {s3_data_path}" ] }, { "cell_type": "markdown", "id": "3f9a523e-e4c8-45fe-b4f3-7d24e267a138", "metadata": {}, "source": [ "### Step 2. Point to the Hugging Face containers and train a model\n", "We strongly recommend using the Hugging Face models webpage to generate the configuration code for your desired model or resource. You can do so here:\n", "- https://huggingface.co/models\n", "\n", "AWS has prebuilt deep learning containers for 5 software frameworks, including TensorFlow, PyTorch, MXNet, Hugging Face, and AutoGloun. You can extend these base images, or simply use the script mode construct as below.\n", "- https://github.com/aws/deep-learning-containers \n", "\n", "To learn more about script mode on SageMaker, check out our documentation here: \n", "- https://sagemaker.readthedocs.io/en/stable/frameworks/index.html" ] }, { "cell_type": "code", "execution_count": 12, "id": "7a0261e8-4e7d-46e8-9608-533d3a8ce6b5", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker.huggingface import HuggingFace\n", "\n", "# gets role for executing training job\n", "hyperparameters = {\n", "\t'model_name_or_path':'t5-small',\n", "\t'output_dir':'/opt/ml/model',\n", " 'train_file': '/opt/ml/input/data/train/{}'.format(train_file_name),\n", " 'do_train': True,\n", " 'source_lang': 'en',\n", " 'target_lang': 'es',\n", " 'source_prefix':\"translate English to Spanish: \" \n", "\t# add your remaining hyperparameters\n", "\t# more info here https://github.com/huggingface/transformers/tree/v4.6.1/examples/pytorch/seq2seq\n", "}\n", "\n", "# git configuration to download our fine-tuning script\n", "git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.6.1'}\n", "\n", "# creates Hugging Face estimator\n", "huggingface_estimator = HuggingFace(\n", " entry_point='run_translation.py',\n", " source_dir='./examples/pytorch/translation',\n", " instance_type='ml.p3.2xlarge',\n", " instance_count=1,\n", " role=role,\n", " git_config=git_config,\n", " transformers_version='4.6.1',\n", " pytorch_version='1.7.1',\n", " py_version='py36',\n", " hyperparameters = hyperparameters\n", ")\n", "\n", "# starting the train job\n", "huggingface_estimator.fit({'train':s3_data_path}, wait=True)" ] }, { "cell_type": "markdown", "id": "5844163e-40a3-4a79-a206-954cb9cf76b6", "metadata": {}, "source": [ "---\n", "# Evaluate your job in the cloud and clean up\n", "That's a wrap! Please make sure that before you share this notebook with anyone you remove your access and secret keys from the cell above. You can also delete the core files themselves, but that will disable you from accessing your AWS account locally going forward." ] }, { "cell_type": "code", "execution_count": null, "id": "da423f04-548d-4d88-b2c3-e1192f0c454c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "default:Python", "language": "python", "name": "conda-env-default-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }