# Add AutoML functionality with Amazon SageMaker Autopilot across accounts

AutoML is a powerful capability, provided by [Amazon SageMaker Autopilot](https://aws.amazon.com/sagemaker/autopilot/), that allows non-experts to easily create machine learning models to invoke in their applications.

The problem that we want to solve arises when, due to governance constraints, the [Amazon SageMaker](https://aws.amazon.com/sagemaker/) resources cannot be deployed in the same AWS account where they are used.

Examples of such a situation are:

1. A multi-account Enterprise setup of AWS where the Autopilot resources must be deployed in a specific AWS account (the trusting account) and should be accessed from trusted accounts
2. A software as a service (SaaS) that offers AutoML to their users and adopts the resources in the customer AWS account so that the billing is associated to the end customer

This notebook walks through the implementation using the SageMaker Python SDK. It is divided into two sections:
* Create the [AWS Identity and Access Management](https://aws.amazon.com/iam/) (IAM) resources needed for cross-account access
* Perform the Autopilot Job, deploy the best model, and make predictions from the trusted account accessing the trusting account 

For full explanation of SageMaker Autopilot you can refer to the examples available in GitHub, particularly [Top Candidates Customer Churn Prediction with Amazon SageMaker Autopilot and Batch Transform (Python SDK)](https://github.com/aws/amazon-sagemaker-examples/blob/master/autopilot/autopilot_customer_churn_high_level_with_evaluation.ipynb).

## Prerequisites

We have two AWS accounts:
- **Customer (trusting) account** - Where the SageMaker resources are deployed
- **SaaS (trusted) account** - Drives the training and prediction activities

You have to [create a user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console) for each account, with programmatic access enabled and the `IAMFullAccess` managed policy associated (hint: for simplicity name the users as their profiles defined below).

You have to [configure the user profiles](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-profiles.html) in the `.aws/credentials` file:
- `customer_config` for the user configured in the customer account
- `saas_config` for the user configured in the SaaS account

To apply more restrictive permission to the users configured in the two accounts consider applying the following policies instead of `IAMFullAccess`.

For the user associated to the `saas_config` profile the policy is:

```python
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateUser",
                "iam:DeleteUser",
                "iam:CreateAccessKey",
                "iam:DeleteAccessKey",
                "iam:ListAccessKeys"
            ],
            "Resource": "arn:aws:iam::<ACCOUNT_NUMBER>:user/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateGroup",
                "iam:DeleteGroup",
                "iam:AddUserToGroup",
                "iam:RemoveUserFromGroup",
                "iam:AttachGroupPolicy",
                "iam:DetachGroupPolicy",
                "iam:ListAttachedGroupPolicies"
            ],
            "Resource": "arn:aws:iam::<ACCOUNT_NUMBER>:group/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreatePolicy",
                "iam:DeletePolicy"
            ],
            "Resource": "arn:aws:iam::<ACCOUNT_NUMBER>:policy/*"
        }
    ]
}
```

For the user associated to the `customer_config` profile the policy is:

```python
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:AttachRolePolicy",
                "iam:DetachRolePolicy",
                "iam:ListAttachedRolePolicies",
                "iam:CreateRole",
                "iam:DeleteRole"
            ],
            "Resource": "arn:aws:iam::<ACCOUNT_NUMBER>:role/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreatePolicy",
                "iam:DeletePolicy"
            ],
            "Resource": "arn:aws:iam::<ACCOUNT_NUMBER>:policy/*"
        }
    ]
}
```

To update to the last release of the [SageMaker SDK](https://pypi.org/project/sagemaker/):

```python
!pip install --upgrade sagemaker
```

### Import of common python modules used in the script

In [None]:
import boto3
import json
import sagemaker
from botocore.exceptions import ClientError

Let's define the AWS Region that will host the resources, we use the default Region configured in [~/.aws/config](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html):

In [None]:
REGION = boto3.Session().region_name

And the reference to the Dataset for the training of the model:

In [None]:
DATASET_URI = "s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt"

## Setup of the IAM Entities

We proceed to establish the sessions used to retrieve the account IDs and initialize the IAM client for each user profile, that is used for the configuration of the IAM entities.

For each of the two accounts we: 
* Create the boto3 session with the profile of the respective configuration user
* Retrieve the account ID by means of AWS STS
* Create the IAM client that performs the configuration steps in the account


For the **customer account**:

In [None]:
customer_config_session = boto3.session.Session(profile_name="customer_config")
CUSTOMER_ACCOUNT_ID = customer_config_session.client("sts").get_caller_identity()["Account"]
customer_iam_client = customer_config_session.client("iam")

And the same for the **SaaS account**:

In [None]:
saas_config_session = boto3.session.Session(profile_name="saas_config")
SAAS_ACCOUNT_ID = saas_config_session.client("sts").get_caller_identity()["Account"]
saas_iam_client = saas_config_session.client("iam")

### Set up the IAM entities in the customer account

Let's first define the role needed to perform cross-account tasks from the SaaS account in the customer account. 

For simplicity the same role is adopted for trusting the SageMaker service in the customer account. Ideally consider splitting this role into two roles with fine-grained permissions in line with the principle of granting the [least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege).

The role name and the references to the ARN of the SageMaker AWS managed policies are as follows:

In [None]:
CUSTOMER_TRUST_SAAS_ROLE_NAME = "customer_trusting_saas"
CUSTOMER_TRUST_SAAS_ROLE_ARN = "arn:aws:iam::{}:role/{}".format(
    CUSTOMER_ACCOUNT_ID, CUSTOMER_TRUST_SAAS_ROLE_NAME
)
SAGEMAKERFULLACCESS_POLICY_ARN = "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"

The following customer managed policy gives the role the permissions to access the Amazon S3 resources, needed for the SageMaker tasks and for the cross-account copy of the dataset. 

We restrict the access to the Amazon S3 buckets that start with the `sagemaker-` prefix, and are conformed to the default Amazon S3 buckets dedicated to SageMaker in the AWS Regions for the customer account.

In [None]:
CUSTOMER_S3_POLICY_NAME = "customer_s3"
CUSTOMER_S3_POLICY = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
            "Resource": [
                "arn:aws:s3:::sagemaker-{}-{}".format(REGION, CUSTOMER_ACCOUNT_ID),
                "arn:aws:s3:::sagemaker-{}-{}/*".format(REGION, CUSTOMER_ACCOUNT_ID),
            ],
        }
    ],
}

Then we define the **external ID** to mitigate the [confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html)

In [None]:
EXTERNAL_ID = "12345"

The **trust relationships** policy accomplishes two objectives: allow the principals from the trusted account and the SageMaker to assume the role

In [None]:
CUSTOMER_TRUST_SAAS_POLICY = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"AWS": "arn:aws:iam::{}:root".format(SAAS_ACCOUNT_ID)},
            "Action": "sts:AssumeRole",
            "Condition": {"StringEquals": {"sts:ExternalId": EXTERNAL_ID}},
        },
        {
            "Effect": "Allow",
            "Principal": {"Service": "sagemaker.amazonaws.com"},
            "Action": "sts:AssumeRole",
        },
    ],
}

First we create the customer managed policy in the customer account:

In [None]:
try:
    create_policy_response = customer_iam_client.create_policy(
        PolicyName=CUSTOMER_S3_POLICY_NAME, PolicyDocument=json.dumps(CUSTOMER_S3_POLICY)
    )
    customer_s3_policy_arn = create_policy_response["Policy"]["Arn"]
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy already exists... hence retrieving policy arn")
        customer_s3_policy_arn = (
            "arn:aws:iam::" + CUSTOMER_ACCOUNT_ID + ":policy/" + CUSTOMER_S3_POLICY_NAME
        )
    else:
        print("Unexpected error occured while creating policy...", error)

Then we create the new role. We specify the [maximum session duration for the role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html#id_roles_use_view-role-max-session) to manage long-running Autopilot jobs, set to 3 hours:

In [None]:
# set to 3 hours
MAX_SESSION_DURATION = 10800

try:
    create_role_response = customer_iam_client.create_role(
        RoleName=CUSTOMER_TRUST_SAAS_ROLE_NAME,
        AssumeRolePolicyDocument=json.dumps(CUSTOMER_TRUST_SAAS_POLICY),
        MaxSessionDuration=MAX_SESSION_DURATION,
    )
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Role already exists... reusing it")
    else:
        print("Unexpected error occurred... Role could not be created", error)

And we attach the two policies:

In [None]:
try:
    policy_attach_response = customer_iam_client.attach_role_policy(
        RoleName=CUSTOMER_TRUST_SAAS_ROLE_NAME, PolicyArn=customer_s3_policy_arn
    )
except ClientError as error:
    print("Policy could not be attached...", error)

try:
    policy_attach_response = customer_iam_client.attach_role_policy(
        RoleName=CUSTOMER_TRUST_SAAS_ROLE_NAME, PolicyArn=SAGEMAKERFULLACCESS_POLICY_ARN
    )
except ClientError as error:
    print("Policy could not be attached...", error)

### Set up the IAM Entities in the SaaS Account

We define:
- A group of users enabled to perform the Autopilot job in the customer account
- A policy associated to the group for assuming the role defined in the customer account
- A policy associated with the group for uploading data on Amazon S3 and managing bucket policies
- A user that is responsible for the execution of the Autopilot jobs; the user has programmatic access
- A user profile to store the user access key and secret in the file for the credentials

Let's start defining the name of the group:

In [None]:
SAAS_USER_GROUP_NAME = "AutopilotUsers"

The first policy refers to the customer account ID and the role:

In [None]:
SAAS_ASSUME_ROLE_POLICY_NAME = "saas_assume_customer_role"
SAAS_ASSUME_ROLE_POLICY = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::{}:role/{}".format(
                CUSTOMER_ACCOUNT_ID, CUSTOMER_TRUST_SAAS_ROLE_NAME
            ),
        }
    ],
}

The second policy is needed to download the dataset, and to create a SageMaker bucket, manage objects in it and associate new bucket policies to the bucket

In [None]:
SAAS_S3_POLICY_NAME = "saas_s3"
SAAS_S3_POLICY = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject"],
            "Resource": ["arn:aws:s3:::{}".format(DATASET_URI.split("://")[1])],
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:PutBucketPolicy",
                "s3:DeleteBucketPolicy",
            ],
            "Resource": [
                "arn:aws:s3:::sagemaker-{}-{}".format(REGION, SAAS_ACCOUNT_ID),
                "arn:aws:s3:::sagemaker-{}-{}/*".format(REGION, SAAS_ACCOUNT_ID),
            ],
        },
    ],
}

For simplicity, we give the same value to the username and to the user profile:

In [None]:
SAAS_USER_PROFILE = SAAS_USER_NAME = "saas_user"

#### setup

Now we create the two new managed policies:

In [None]:
try:
    create_policy_response = saas_iam_client.create_policy(
        PolicyName=SAAS_ASSUME_ROLE_POLICY_NAME, PolicyDocument=json.dumps(SAAS_ASSUME_ROLE_POLICY)
    )
    saas_assume_role_policy_arn = create_policy_response["Policy"]["Arn"]
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy already exists... hence retrieving policy arn")
        saas_assume_role_policy_arn = (
            "arn:aws:iam::" + SAAS_ACCOUNT_ID + ":policy/" + SAAS_ASSUME_ROLE_POLICY_NAME
        )
    else:
        print("Unexpected error occured while creating policy...", error)

In [None]:
try:
    create_policy_response = saas_iam_client.create_policy(
        PolicyName=SAAS_S3_POLICY_NAME, PolicyDocument=json.dumps(SAAS_S3_POLICY)
    )
    saas_s3_policy_arn = create_policy_response["Policy"]["Arn"]
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Policy already exists... Hence retrieving policy arn")
        saas_s3_policy_arn = "arn:aws:iam::" + SAAS_ACCOUNT_ID + ":policy/" + SAAS_S3_POLICY_NAME
    else:
        print("Unexpected error occured while creating policy...", error)

Then create the group:

In [None]:
try:
    create_group_response = saas_iam_client.create_group(GroupName=SAAS_USER_GROUP_NAME)
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Group already exists... reusing it")
    else:
        print("Unexpected error occured while creating group... ", error)

Next attach the policies to the group:

In [None]:
try:
    attach_policy_response = saas_iam_client.attach_group_policy(
        GroupName=SAAS_USER_GROUP_NAME, PolicyArn=saas_assume_role_policy_arn
    )
except ClientError as error:
    print("Unexpected error occurred while attaching policy...", error)

try:
    attach_policy_response = saas_iam_client.attach_group_policy(
        GroupName=SAAS_USER_GROUP_NAME, PolicyArn=saas_s3_policy_arn
    )
except ClientError as error:
    print("Unexpected error occurred while attaching policy...", error)

Letâ€™s create the user, see how to manage the access keys if the user already exists:

In [None]:
try:
    create_user_response = saas_iam_client.create_user(UserName=SAAS_USER_NAME)
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print(
            "User already exists... We reuse it but will recreate a new Access Key after deleting the existings"
        )
        user_access_keys = saas_iam_client.list_access_keys(UserName=SAAS_USER_NAME)
        for AccessKeyId in [
            element["AccessKeyId"] for element in user_access_keys["AccessKeyMetadata"]
        ]:
            saas_iam_client.delete_access_key(UserName=SAAS_USER_NAME, AccessKeyId=AccessKeyId)
    else:
        print("Unexpected error occured while creating user....", error)

To give programmatic access we create a new Access Key and Secret for the user:

In [None]:
try:
    create_akey_response = saas_iam_client.create_access_key(UserName=SAAS_USER_NAME)
except ClientError as error:
    if error.response["Error"]["Code"] == "EntityAlreadyExists":
        print("Key already exists...")
    else:
        print("Unexpected error occured while creating user....", error)

Finally, add the User to the Group of Autopilots:

In [None]:
try:
    user_to_group_response = saas_iam_client.add_user_to_group(
        GroupName=SAAS_USER_GROUP_NAME, UserName=SAAS_USER_NAME
    )
except ClientError as error:
    print("Unexpected error occured while adding user to group", error)

### Update the credentials file

Create the user profile for the `saas_user` in the `.aws/credentials` file.

In [None]:
import configparser
from pathlib import Path

credentials_config = configparser.ConfigParser()
credentials_config.read(str(Path.home()) + "/.aws/credentials")

In [None]:
if not credentials_config.has_section(SAAS_USER_PROFILE):
    credentials_config.add_section(SAAS_USER_PROFILE)

credentials_config[SAAS_USER_PROFILE]["aws_access_key_id"] = create_akey_response["AccessKey"][
    "AccessKeyId"
]
credentials_config[SAAS_USER_PROFILE]["aws_secret_access_key"] = create_akey_response["AccessKey"][
    "SecretAccessKey"
]

with open(str(Path.home()) + "/.aws/credentials", "w") as configfile:
    credentials_config.write(configfile, space_around_delimiters=False)

This completes the configuration of IAM entities that are needed for the cross-account implementation of the Autopilot job.

## Autopilot cross-account access

This is the core objective of the notebook, where we demonstrate the main differences with respect to the single account scenario.

First we prepare the dataset the Autopilot job will use for training the models.

### Data

We reuse the same dataset adopted in the SageMaker example: [Top Candidates Customer Churn Prediction with Amazon SageMaker Autopilot and Batch Transform (Python SDK)](https://github.com/aws/amazon-sagemaker-examples/blob/master/autopilot/autopilot_customer_churn_high_level_with_evaluation.ipynb).

For a full explanation of the Data, you can refer to the original example.

We skip the data inspection and proceed directly to cross-account Autopilot job invocation.

Note: In case of failure of the following copy, due to the [Eventual consistency](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#eventual-consistency) of the IAM resources that we have just created, you can retry it in few seconds.

In [None]:
# delay introduced to mitigate the eventual consistency of IAM resources
! sleep 10

!aws s3 cp $DATASET_URI ./ --profile saas_user

### Split the dataset for the Autopilot job and the inference phase

After you load the dataset, split it into two parts:

- 80% as input to the Autopilot job for the training of the best model
- 20% for test inference on the model endpoint that will be deployed

Autopilot applies a [cross-validation](https://aws.amazon.com/it/about-aws/whats-new/2021/05/amazon-sagemaker-autopilot-adds-automatic-cross-validation-to-im/) resampling procedure, on the dataset passed as input, to all candidate algorithms to test their ability to predict data they have not been trained on.

In [None]:
import numpy as np
import pandas as pd

churn = pd.read_csv("./churn.txt")

train_data = churn.sample(frac=0.8, random_state=200)
test_data = churn.drop(train_data.index)
test_data_no_target = test_data.drop(columns=["Churn?"])

Let's save the train data into a file locally that we will pass to the `fit` method of the `AutoML` estimator.

In [None]:
train_file = "train_data.csv"
train_data.to_csv(train_file, index=False, header=True)

## Autopilot training job, deploy and prediction

The following are the steps for the cross-account invocation:

1.	Initiate a session as `saas_user` in the SaaS account loading the profile from the `credentials`
2.	Assume the Role int the customer account via AWS STS
3.	Set up and train the AutoML estimator in the customer account
4.	Deploy the top candidate model proposed by AutoML in the customer account
5.	Invoke the deployed model endpoint for the prediction on test data


### 1. Initiate the user session in the SaaS account

The Setup of IAM entities has created the `saas_user` identified by the `saas_user` profile in the `.aws/credentials`. We initiate a boto3 session with such profile.


In [None]:
saas_user_session = boto3.session.Session(profile_name=SAAS_USER_PROFILE, region_name=REGION)

The `saas_user` inherits from the `AutopilotUsers` group the permission to assume the `customer_trusting_saas` role in the customer account.

### 2.	Assume the role in the customer account via AWS STS

AWS STS provides the credentials for a temporary session that will be initiated in the Customer Account.

In [None]:
saas_sts_client = saas_user_session.client("sts", region_name=REGION)

The default [session duration](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) is 1 hour. We set it to the maximum duration session value set for the role. If the session expires, it can be recreated by performing the following steps again.

We adopt a retry process to address the [Eventual consistency](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#eventual-consistency) of the IAM resources that we have just created.

In [None]:
from time import sleep

In [None]:
retries = 0
MAX_RETRIES = 2

while retries < MAX_RETRIES:
    try:
        sleep(10 + 10 * retries)
        assumed_role_object = saas_sts_client.assume_role(
            RoleArn=CUSTOMER_TRUST_SAAS_ROLE_ARN,
            RoleSessionName="sagemaker_autopilot",
            ExternalId=EXTERNAL_ID,
            DurationSeconds=MAX_SESSION_DURATION,
        )
        break
    except:
        print("Unexpected error occured while assuming role, retries ", retries)
        retries += 1

if retries >= MAX_RETRIES:
    # last attempt, if fails it will display the detailed trace for troubleshooting
    sleep(10 + 10 * retries)
    assumed_role_object = saas_sts_client.assume_role(
        RoleArn=CUSTOMER_TRUST_SAAS_ROLE_ARN,
        RoleSessionName="sagemaker_autopilot",
        ExternalId=EXTERNAL_ID,
        DurationSeconds=MAX_SESSION_DURATION,
    )

In [None]:
assumed_role_credentials = assumed_role_object["Credentials"]

assumed_role_session = boto3.Session(
    aws_access_key_id=assumed_role_credentials["AccessKeyId"],
    aws_secret_access_key=assumed_role_credentials["SecretAccessKey"],
    aws_session_token=assumed_role_credentials["SessionToken"],
    region_name=REGION,
)

sagemaker_session = sagemaker.Session(boto_session=assumed_role_session)

Note: the sagemaker_session is needed for using the high level `AutoML` estimator.

### 3. Setting up and train the AutoML estimator in the customer account

We will use the AutoML estimator from SageMaker Python SDK to invoke the Autopilot job to train a set of candidate models for the training data.

The setup of AutoML object is similar to the single account scenario, but with the following differences for the cross-account invocation:
* The role for SageMaker service is referred in the customer account `CUSTOMER_TRUST_SAAS_ROLE_ARN`
* The sagemaker_session is the temporary session created by means of the STS service

Currently, Autopilot supports only tabular datasets in CSV format. Either all files should have a header row, or the first file of the dataset, when sorted in alphabetical/lexical order by name, is expected to have a header row.

In [None]:
target_attribute_name = "Churn?"

In [None]:
from sagemaker import AutoML
from time import gmtime
from time import sleep
from time import strftime

timestamp_suffix = strftime("%d-%H-%M-%S", gmtime())
base_job_name = "automl-churn-sdk-" + timestamp_suffix

target_attribute_name = "Churn?"
target_attribute_values = np.unique(train_data[target_attribute_name])
target_attribute_true_value = target_attribute_values[1]  # 'True.'

In [None]:
automl = AutoML(
    role=CUSTOMER_TRUST_SAAS_ROLE_ARN,
    target_attribute_name=target_attribute_name,
    base_job_name=base_job_name,
    sagemaker_session=sagemaker_session,
    max_candidates=10,
)

We now launch the Autopilot job by calling the `fit` method of the `AutoML` estimator in the same way as in the single account example. We consider the following alternative options for providing the training dataset to the estimator:

#### First option: upload a local file and train by fit method

We simply pass the training dataset by referring to the local file that the fit method will upload into the default Amazon S3 bucket used by SageMaker in the customer account.

```python
automl.fit(train_file, job_name=base_job_name, wait=False, logs=False)
```

#### Second option: cross-account copy

Most likely the training dataset is located into an Amazon S3 bucket owned by the SaaS account. We copy the dataset from the SaaS account into the customer account, and refer to the URI of the copy in the `fit` method.

First we upload the dataset into a local bucket of the SaaS account, for convenience we use the SageMaker default bucket in the Region

In [None]:
DATA_PREFIX = "auto-ml-input-data"

In [None]:
local_session = sagemaker.Session(boto_session=saas_user_session)
local_session_bucket = local_session.default_bucket()

In [None]:
train_data_s3_path = local_session.upload_data(path=train_file, key_prefix=DATA_PREFIX)

print("Train data uploaded to:", train_data_s3_path)

To allow the cross-account copy we set the following policy into the local bucket, only for the time needed for the copy operation:

In [None]:
train_data_s3_arn = "arn:aws:s3:::{}/{}/{}".format(local_session_bucket, DATA_PREFIX, train_file)

bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {"AWS": CUSTOMER_TRUST_SAAS_ROLE_ARN},
            "Action": "s3:GetObject",
            "Resource": train_data_s3_arn,
        }
    ],
}
# Convert the policy from JSON dict to string
bucket_policy = json.dumps(bucket_policy)

# Set the new policy
saas_s3_client = saas_user_session.client("s3")
saas_s3_client.put_bucket_policy(Bucket=local_session_bucket, Policy=bucket_policy)

Then copy performed by the assumed role in the customer account:

In [None]:
assumed_role_s3_client = boto3.client(
    "s3",
    aws_access_key_id=assumed_role_credentials["AccessKeyId"],
    aws_secret_access_key=assumed_role_credentials["SecretAccessKey"],
    aws_session_token=assumed_role_credentials["SessionToken"],
)

In [None]:
target_train_key = "{}/{}".format(DATA_PREFIX, train_file)
assumed_role_s3_client.copy_object(
    Bucket=sagemaker_session.default_bucket(),
    CopySource=train_data_s3_path.split("://")[1],
    Key=target_train_key,
)

Delete the bucket policy so that the access has been guaranteed only for the time of the copy

In [None]:
saas_s3_client.delete_bucket_policy(Bucket=local_session_bucket)

Finally, we launch the Autopilot job passing the URI of the object copy

In [None]:
target_train_uri = "s3://{}/{}".format(sagemaker_session.default_bucket(), target_train_key)
automl.fit(target_train_uri, job_name=base_job_name, wait=False, logs=False)

Another option is to refer to the uri of the source dataset in the bucket in SaaS Account. In this case the bucket policy should include `s3:ListBucket` action for the source bucket: 

```python
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::CUSTOMER_ACCOUNT_ID:role/customer_trusting_saas"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::sagemaker-REGION-SAAS_ACCOUNT_ID"
        }
```

and the policy should be set for the time of the entire training.

#### Tracking SageMaker Autopilot Job Progress

We can use the `describe_auto_ml_job` method to check the status of our SageMaker Autopilot job.

In [None]:
print("JobStatus - Secondary Status")
print("------------------------------")


describe_response = automl.describe_auto_ml_job()
print(describe_response["AutoMLJobStatus"] + " - " + describe_response["AutoMLJobSecondaryStatus"])
job_run_status = describe_response["AutoMLJobStatus"]

while job_run_status not in ("Failed", "Completed", "Stopped"):
    describe_response = automl.describe_auto_ml_job()
    job_run_status = describe_response["AutoMLJobStatus"]

    print(
        describe_response["AutoMLJobStatus"] + " - " + describe_response["AutoMLJobSecondaryStatus"]
    )
    sleep(30)

**Note:** Reconnect to the Autopilot job

As an Autopilot job can take long time, if the session token expires during the fit you can create a new session following the steps described earlier in section **Assume the role in the customer account via AWS STS** and retrieve the current Autopilot job reference by executing:

```python
automl = AutoML.attach(auto_ml_job_name=base_job_name, sagemaker_session=sagemaker_session)
```

### 4. Deploy the Top Candidate proposed by AutoML

Autopilot job trains and returns a set of trained candidate models, identifying among them the top candidate that optimizes the evaluation metric related to the [type of ML problem](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-problem-types.html).

Here we only demonstrate the deployment of the top candidate proposed by AutoML, but you have the option to choose a different candidate that would better fit your business criteria.

First we review the performance achieved by the top candidate in the cross-validation:

In [None]:
best_candidate = automl.describe_auto_ml_job()["BestCandidate"]
best_candidate_name = best_candidate["CandidateName"]

print("\n")
print("CandidateName: " + best_candidate_name)
print(
    "FinalAutoMLJobObjectiveMetricName: "
    + best_candidate["FinalAutoMLJobObjectiveMetric"]["MetricName"]
)
print(
    "FinalAutoMLJobObjectiveMetricValue: "
    + str(best_candidate["FinalAutoMLJobObjectiveMetric"]["Value"])
)

If the performance is good enough for our business criteria, we deploy the top candidate in the **customer account**:

In [None]:
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer

inference_response_keys = ["predicted_label", "probability"]

predictor = automl.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    inference_response_keys=inference_response_keys,
    predictor_cls=Predictor,
    serializer=CSVSerializer(),
    deserializer=CSVDeserializer(),
)

print("Created endpoint: {}".format(predictor.endpoint_name))

### 5. Prediction on test data

Finally, we access the model endpoint for the prediction of the label output for the test data:

In [None]:
predictor.predict(test_data_no_target.to_csv(sep=",", header=False, index=False))

**Note:** Connect to a deployed endpoint

If the session token expires after the deployment of the endpoint, you can recreate a new session following the steps described earlier in section **Assume the role in the customer account via AWS STS** and connect to the already deployed endpoint by executing:

```python
predictor = Predictor(predictor.endpoint_name, 
                      sagemaker_session = sagemaker_session,
                      serializer=CSVSerializer(), 
                      deserializer=CSVDeserializer())
```

## Clean up

To avoid incurring unnecessary charges, delete the endpoints and resources that were created when deploying the model after they are no longer needed. 


### Delete the deployed model endpoint

The model endpoint is deployed in a container that is always active. We delete it first main consumption of credits.

In [None]:
predictor.delete_endpoint()

### Delete the artifacts generated by the Autopilot job

Now delete all the artifacts created by the Autopilot job like generated candidate models, scripts, notebooks, etc.

We use the [high level resource for Amazon S3](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html) to simplify the operation.

In [None]:
assumed_role_s3_resource = boto3.resource(
    "s3",
    aws_access_key_id=assumed_role_credentials["AccessKeyId"],
    aws_secret_access_key=assumed_role_credentials["SecretAccessKey"],
    aws_session_token=assumed_role_credentials["SessionToken"],
)
s3_bucket = assumed_role_s3_resource.Bucket(automl.sagemaker_session.default_bucket())
s3_bucket.objects.filter(Prefix=base_job_name).delete()

### Delete the training dataset copied into the customer account

In [None]:
from urllib.parse import urlparse

train_data_uri = automl.describe_auto_ml_job()["InputDataConfig"][0]["DataSource"]["S3DataSource"][
    "S3Uri"
]

o = urlparse(train_data_uri, allow_fragments=False)
assumed_role_s3_resource.Object(o.netloc, o.path.lstrip("/")).delete()

### Cleanup of IAM entities

We delete elements in reverse order to the creation phase.

#### 1. Remove the user from the group in the SaaS account:

In [None]:
remove_user_from_group_response = saas_iam_client.remove_user_from_group(
    GroupName=SAAS_USER_GROUP_NAME, UserName=SAAS_USER_NAME
)

#### 2. Delete the User credentials, remove the access keys from the user and remove the user from the SaaS account:

In [None]:
credentials_config.remove_section(SAAS_USER_PROFILE)
with open(str(Path.home()) + "/.aws/credentials", "w") as configfile:
    credentials_config.write(configfile, space_around_delimiters=False)

In [None]:
user_access_keys = saas_iam_client.list_access_keys(UserName=SAAS_USER_NAME)
for AccessKeyId in [element["AccessKeyId"] for element in user_access_keys["AccessKeyMetadata"]]:
    saas_iam_client.delete_access_key(UserName=SAAS_USER_NAME, AccessKeyId=AccessKeyId)

In [None]:
delete_user_response = saas_iam_client.delete_user(UserName=SAAS_USER_NAME)

#### 3. Detach the policies from the group in the SaaS account, delete group and the policies

In [None]:
attached_group_policies = saas_iam_client.list_attached_group_policies(
    GroupName=SAAS_USER_GROUP_NAME
)
for PolicyArn in [element["PolicyArn"] for element in attached_group_policies["AttachedPolicies"]]:
    detach_policy_response = saas_iam_client.detach_group_policy(
        GroupName=SAAS_USER_GROUP_NAME, PolicyArn=PolicyArn
    )

delete_group_response = saas_iam_client.delete_group(GroupName=SAAS_USER_GROUP_NAME)

delete_policy_response = saas_iam_client.delete_policy(PolicyArn=saas_assume_role_policy_arn)

delete_policy_response = saas_iam_client.delete_policy(PolicyArn=saas_s3_policy_arn)

#### 4. Detach the AWS policies from the role in the customer account, delete the role and the policy

In [None]:
attached_role_policies = customer_iam_client.list_attached_role_policies(
    RoleName=CUSTOMER_TRUST_SAAS_ROLE_NAME
)
for PolicyArn in [element["PolicyArn"] for element in attached_role_policies["AttachedPolicies"]]:
    detach_policy_response = customer_iam_client.detach_role_policy(
        RoleName=CUSTOMER_TRUST_SAAS_ROLE_NAME, PolicyArn=PolicyArn
    )

delete_role_response = customer_iam_client.delete_role(RoleName=CUSTOMER_TRUST_SAAS_ROLE_NAME)

delete_policy_response = customer_iam_client.delete_policy(PolicyArn=customer_s3_policy_arn)