# Using [@remote](https://sagemaker.readthedocs.io/en/stable/remote_function/sagemaker.remote_function.html#remote-decorator) Decorator from Amazon SageMaker Python SDK with private PyPI repository

This notebook shows how to use @remote decorator from Amazon SageMaker Python SDK, to use a private package repository hosted on [AWS CodeArtifact](https://docs.aws.amazon.com/codeartifact/latest/ug/welcome.html). The training job runs in a VPC with no internet access.

## Fetch VPC Endpoint URL for CodeArtifact API

First of all, we need to fetch the VPC Endpoint URLs for CodeArtifact API. This endpoint URL is needed to establish connection with the private PyPI repository.

The `describe_vpc_endpoints` API returns two VPC Endpoint URLs and we can use any of those to connect to CodeArticat repository.

In [None]:
import boto3

ec2 = boto3.client('ec2')
boto3_session = boto3.session.Session()
region = boto3_session.region_name
sts = boto3.client('sts', endpoint_url=f"https://sts.{region}.amazonaws.com/")
account_id = sts.get_caller_identity()["Account"]

account_id

In [None]:

response = ec2.describe_vpc_endpoints(
 Filters=[
 {
 'Name': 'service-name',
 'Values': [
 f'com.amazonaws.{boto3_session.region_name}.codeartifact.api'
 ]
 },
 ]
)

code_artifact_api_vpc_endpoint = response['VpcEndpoints'][0]['DnsEntries'][0]['DnsName']

endpoint_url = f'https://{code_artifact_api_vpc_endpoint}'
endpoint_url

## Update Config.yaml

Next, we need to update the `config.yaml`. It can be found under [config](../config/config.yaml) folder. The following fields need to be updated:
- **PreExecutionCommands** - Update this field with the right connection URL for CodeArtifact private repository
 
- **RoleArn** : Provide an execution role ARN which should be used for running the training job. In this case, the current SageMaker Execution role would work fine. This can be found either from the output of studio-codeartifact.yaml stack created earlier. Alternatively, this can also be found in the domain details from SageMaker Console.
 
- **S3RootUri**: S3 Bucket where all the job config and output will be stored. Provide any bucket in the current region.
 
- **VpcConfig**: Provide details around the subnet IDs of the VPC and Security Group ID associated with SageMaker Studio. This info can be found from the output of the vpc.yaml stack created earlier. Another way would be to just navigate to VPC console and find the Subnet IDs and navigate to EC2 console to find the SageMaker Security Group.


___NOTE: It is not mandatory to use the config.yaml in order to work with @remote decorator. Rather this is just a cleaner way to supply all configurations to @remote decorator. All the configs can very well be supplied directly in the decorator arguments but that reduces readability and maintainability of changes in the long run.___

## Authenticate to CodeArtifact private PyPI repository

The next thing to do is to authenticate to CodeArtifact private PyPI repository so an active login session can be established. By default the session is valid for 12 hours and is configurable as well. Change the domain owner value to the AWS account id


In [None]:
!aws codeartifact login --tool pip --domain anycompany --domain-owner {account_id} --repository private-pypi --endpoint-url {endpoint_url}

## Install dependencies in the notebook env

This is needed in order to run the train.py in the subsequent cells

In [None]:
!pip install --no-cache-dir -r ./config/requirements.txt

## Run the training job

Now, we can start the training job by simply running the `train.py` file. Lets see the content of the file to see how different it is from native pytorch code.

In [None]:
!pygmentize scripts/train.py

# 
As we can see the code is agnostic to SageMaker with only change being `@remote(include_local_workdir=True)` being mentioned for `perform_train()` method. And that's it.

Now, lets run the training job

In [None]:
import os 
os.chdir ('scripts')

!python ./train.py

os.chdir('../')