# Snowflake + Amazon SageMaker Autopilot Integration Overview Organizations are increasingly using Snowflake to unify, integrate, analyze, and share previously fragmented data, and want to use state of the art machine learning (ML) to glean business insights. However, development of ML models based on large datasets requires extensive programming expertise and knowledge of ML frameworks. Meanwhile, most organizations have teams of analysts with the domain knowledge necessary to build ML models but lack the machine learning expertise required to train and deploy them. To address this, Snowflake is now integrated with Amazon SageMaker Autopilot to enable analysts and other SQL users to automatically build and deploy state-of-the-art machine learning models. Snowflake + Amazon SageMaker Autopilot Integration enables users to: - **Create and manage ML models**: Use standard SQL queries in Snowflake to access Autopilot APIs and automatically create the best machine learning model for your data in Snowflake. Autopilot does all the heavy lifting by automatically exploring, training, and tuning different ML algorithms, and providing the model that best fits your data. - **Make predictions**: Use standard SQL queries to deploy, invoke and manage ML models to SageMaker endpoints and make predictions from within Snowflake. ## Solution Architecture ### Solution Overview Snowflake + Amazon SageMaker Autopilot Integration sets up a reference architecture that allows you to directly access Amazon SageMaker machine learning (ML) APIs in Snowflake. The application it deploys is powered by Snowflake's [external functions](https://docs.snowflake.com/en/sql-reference/external-functions-introduction.html) and [request translators](https://docs.snowflake.com/en/LIMITEDACCESS/external-functions-serializers.html) features, which allow you to directly create, use, and make predictions from SageMaker machine learning models using simple SQL commands. | ![Snowflake + Amazon SageMaker Autopilot Solution Architecture](images/image1.png) | |:--:| | *Fig 1. Snowflake + Amazon SageMaker Autopilot Solution Architecture* | 1. When a supported `AWS_AUTOPILOT` SQL command is executed, the UI client program passes Snowflake a SQL statement that calls an external function. As part of query execution, Snowflake reads the external function definition, which contains the URL of the API Gateway service and the name of the API integration that contains authentication information for that proxy service. It also passes the data for formatting through any request translators and response translators associated with the external function. 2. Snowflake then reads information from the API integration and composes an HTTP POST request that contains the headers, data to be sent and authentication information and forwards the requests to the API Gateway. 3. API Gateway then forwards the call to the respective SageMaker API. ### Setup The integration provides a reference AWS [CloudFormation](https://aws.amazon.com/cloudformation/resources/templates/) template that sets up the required resources on AWS and Snowflake. The template aims to automate as much of the setup and act as a starting point and can be extended as needed. Deploying the CloudFormation template using the default parameters builds the following serverless environment: | ![AWS Cloudformation Template Setup](images/image2.png) | |:--:| | *Fig 2. AWS Cloudformation Template Setup* | | ![AWS Cloudformation Template Setup](images/image3.png) | |:--:| | *Fig 3. AWS Cloudformation Template Setup with VPC* | The CloudFormation template transparently and automatically creates the following **AWS Resources:** - **Amazon API Gateway** REST API with endpoints to facilitate connection between Snowflake external functions and SageMaker API's. See [Amazon API Gateway documentation](https://docs.aws.amazon.com/apigateway/index.html) to learn more about the service. - **S3 bucket** to store the training data and model artifacts created by Autopilot. See [S3 documentation](https://aws.amazon.com/s3/getting-started/) to learn more about the service. - **AWS Lambda** which acts as a setup Lambda function that uses the Snowflake Python connector and credentials stored in the AWS Secrets manager to connect to and setup resources in Snowflake. See [AWS Lambda documentation](https://docs.aws.amazon.com/lambda/index.html) to learn more about the service. - **IAM roles** to access the resources and set up trust relationships between Snowflake and the Amazon API Gateway. See [IAM roles documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) to learn more about the service. See Snowflake documentation to learn more on [linking the API integration object in Snowflake to Amazon API Gateway using IAM roles](https://docs.snowflake.com/en/sql-reference/external-functions-creating-aws-common-api-integration-proxy-link.html). **Snowflake Resources:** - **Storage Integration** required to copy data from a Snowflake table to an Amazon S3 bucket for training. See Snowflake's documentation on [Storage Integrations](https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration.html) to learn more. - **API Integration** required by the Snowflake external functions to talk to Amazon API Gateway. See Snowflake's documentation on [API Integrations](https://docs.snowflake.com/en/sql-reference/sql/create-api-integration.html) to learn more. - **External functions and associated request translators and response translators** that correspond to various SageMaker calls. See Snowflake's documentation on [External Functions](https://docs.snowflake.com/en/sql-reference/external-functions-introduction.html) and [Request Translators](https://docs.snowflake.com/en/LIMITEDACCESS/external-functions-serializers.html) to learn more. ## Getting Started ### Planning the Deployment Before you deploy the CloudFormation template, review the following information and ensure that your AWS and Snowflake accounts are properly configured and you have the right set of permissions. Otherwise, deployment might fail. **Snowflake account** - If you don't already have a Snowflake account, create one at [https://signup.snowflake.com/](https://signup.snowflake.com/). As SageMaker runs on the AWS cloud, for best performance it is recommended to use a Snowflake AWS deployment. **AWS account** - If you don't already have an AWS account, create one at [https://aws.amazon.com](https://aws.amazon.com). Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use. #### AWS Services Quotas The resources created by the CloudFormation template provided should not exceed any service quota for your AWS account.\ Should any service quota exceed the limit, you can verify your limits and ask for quota increases in the [Service Quotas console](https://console.aws.amazon.com/servicequotas/home?region=us-east-2#!/). When creating models and performing predictions, Snowflake will create AutoML jobs and SageMaker Endpoints in your AWS account.\ This can result in reaching the [SageMaker service quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html#limits_sagemaker) for your AWS account. If you encounter error messages that you\'ve exceeded your quota, use [AWS Support](https://console.aws.amazon.com/support/) to request a service limit increase for the SageMaker resources you want to scale up. #### Permissions **AWS IAM permissions:** Before deploying the CloudFormation template, you must sign in to the AWS Management Console with IAM permissions for the resources that the templates deploy. The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions. For more information, see [AWS managed policies for job functions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html). **Snowflake permissions:** In order for the template to create the required Snowflake resources, you will need to have a Snowflake role with permissions to create Storage Integrations, API Integrations and Functions. This could be the Account Administrator role or a custom role with the above privileges. See Snowflake [roles](https://docs.snowflake.com/en/user-guide/security-access-control-overview.html#roles) and [privileges](https://docs.snowflake.com/en/user-guide/security-access-control-overview.html#privileges) for more information. **Storing Snowflake credentials in AWS Secrets Manager:** The CloudFormation template takes as an input an ARN to an AWS Secret that has the Snowflake account details and credentials to securely connect to and create Snowflake resources required by the integration. To save your credentials: - Go to the AWS Management Console. - From the top right corner select the AWS region, you plan to deploy the template in. **Note:** It is required that you store your secret in the same region you will be deploying the template in. - In the top search bar, search for **Secrets Manager**. - Click on **Store a new secret.** - Select **Other type of secrets.** - On the **Secret key/value** tab fill 3 key/value rows:\ username (this contains your Snowflake username)\ password (this contains your Snowflake password)\ accountid (this contains your Snowflake [account identifier](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html)) - If you click on the Plaintext tab you should see something like this: ``` { "accountid": "snowflake_account_id", "username": "snowflake_user", "password": "snowflake_password" } ``` - Leave the default encryption key selected and click next. - Give a name to your Secret and click next. - You can leave the remaining options unchanged and click **Store** on the final screen. ### Deploying the CloudFormation Template Sign in to your AWS account, and from the upper-right corner of the navigation bar choose the Region you want the resources created by the CloudFormation Template to be set up in. It is recommended to deploy the AWS resources to the same region the Snowflake deployment runs on. #### Upload the Template 1. Go to the AWS Management Console. 2. In the top search bar, search for **CloudFormation**. 3. Under Services, click on **CloudFormation**. 4. Click on **Create stack**.\ If given a choice between **With new resources (standard)** or **With existing resources (import resources)**, then choose **With new resources (standard)**. 5. On the **Create stack** page, under **Prepare template**, select **Template is ready**. 6. Select **Upload a template file**. 7. Select **Choose file**. 8. Navigate to the directory that contains your copy of the template, then select that template. 9. Click **Next** to reach the page on which you enter names for resources, etc. #### Configure Your Options The template contains default values for most fields. However, you need to enter a few values, such as the names for the resources and the ARN to the AWS Secret Manager. 1. Enter a name for the stack. 2. **apiGatewayName** - Enter the name of the API Gateway to be created. Default name will be snowflake-autopilot-api. 3. **apiGatewayStageName** - Enter the name of the API deployment stage to be created. Default name will be snowflake-autopilot-stage. 4. **s3BucketName** - Enter the name of the S3 bucket to be created to store the training data and artifacts produced by the AutoML jobs. 5. **kmsKeyArn** - Optional parameter. Enter ARN of the AWS Key Management Service key that Amazon SageMaker can use to encrypt job outputs. The KmsKeyId is applied to all outputs. 6. **snowflakeDatabaseName** - Enter the name of the Snowflake Database in which to create the external functions and request translators. 7. **snowflakeSchemaName** - Enter the name of the Snowflake Database Schema in which to create the external functions and request translators. 8. **snowflakeResourceSuffix** - Optional parameter. Enter a unique suffix that can be appended to the Snowflake resources created. This suffix will be added to all the functions created in the provided Snowflake database schema. ***Note:** If you have multiple users deploying the template to the same Snowflake account and using the same Snowflake database and schemas it's recommended to provide the snowflakeResourceSuffix in order to prevent overriding of any existing resources deployed by other users.* 9. **snowflakeRole** - Enter the name of the Snowflake Role with permissions to create storage integrations, API integrations and functions. Default value will be the ACCOUNTADMIN role. 10. **snowflakeSecretArn** - Enter the ARN of the secret from AWS Secrets Manager containing the Snowflake login information. 11. Click **Next**.\ This page has some advanced options for template deployment. 1. Optionally, set advanced options, such as stack policy. These are not needed when creating the sample function using the template supplied by Snowflake. 2. Click **Next**. 12. On the review page, scroll down to the end and acknowledge that the CloudFormation template might create IAM resources with custom names. This is needed because the template creates three IAM roles as part of the deployment. 13. Click on **Create stack**. The deployment will take a few seconds. After the deployment is complete, you should be on the **Events** tab for the newly created stack. The created resources will be listed under the **Resources** tab. If the deployment of the CloudFormation template was successful, you now have all the required resources created on the AWS and Snowflake side required for the integration. ## Working with SageMaker APIs from Snowflake 1. Login to your Snowflake account in which the resources have been created by the CloudFormation template. 2. The template should have set up: a. Storage Integration with the name: `AWS_AUTOPILOT_STORAGE_INTEGRATION_YOURSTACKNAME` b. API Integration with the name: `AWS_AUTOPILOT_API_INTEGRATION_YOURSTACKNAME` You can use the SQL command `SHOW INTEGRATIONS LIKE '%AWS_AUTOPILOT%'` to see the integrations created and use the [DESCRIBE INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/desc-integration.html) command to get details on properties of a particular integration. **Note:** Since API and storage integrations are account-level objects, in order to avoid overriding existing integrations, the names are appended with the stack name provided as input during cloud formation template deployment. c. The following external functions and translators (JavaScript functions) are displayed: - `AWS_AUTOPILOT_CREATE_MODEL` - `AWS_AUTOPILOT_CREATE_MODEL_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_CREATE_MODEL_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_DESCRIBE_MODEL` - `AWS_AUTOPILOT_DESCRIBE_MODEL_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_DESCRIBE_MODEL_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_PREDICT_OUTCOME` - `AWS_AUTOPILOT_PREDICT_OUTCOME_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_PREDICT_OUTCOME_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG` - `AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_DESCRIBE_ENDPOINT_CONFIG` - `AWS_AUTOPILOT_DESCRIBE_ENDPOINT_CONFIG_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_DESCRIBE_ENDPOINT_CONFIG_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_DELETE_ENDPOINT_CONFIG` - `AWS_AUTOPILOT_DELETE_ENDPOINT_CONFIG_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_DELETE_ENDPOINT_CONFIG_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_CREATE_ENDPOINT` - `AWS_AUTOPILOT_CREATE_ENDPOINT_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_CREATE_ENDPOINT_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_DESCRIBE_ENDPOINT` - `AWS_AUTOPILOT_DESCRIBE_ENDPOINT_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_DESCRIBE_ENDPOINT_RESPONSE_TRANSLATOR` - `AWS_AUTOPILOT_DELETE_ENDPOINT` - `AWS_AUTOPILOT_DELETE_ENDPOINT_REQUEST_TRANSLATOR` - `AWS_AUTOPILOT_DELETE_ENDPOINT_RESPONSE_TRANSLATOR` You can use the SQL command `SHOW FUNCTIONS LIKE '%AWS_AUTOPILOT%'` to see all the functions created and use the [DESCRIBE FUNCTION](https://docs.snowflake.com/en/sql-reference/sql/desc-function.html) command to get details on the specified function, including the signature (i.e. arguments), return value, language, and body (i.e. definition). **Note:** Since API and Storage integrations are account level objects, in order to avoid overriding existing integrations, the names are appended with the stack name provided as input during cloud formation template deployment. ### Create Model Use the `AWS_AUTOPILOT_CREATE_MODEL` external functions below to kick-off model creation on your data in a Snowflake table. #### Option 1 **Syntax:** ``` AWS_AUTOPILOT_CREATE_MODEL(MODELNAME VARCHAR, TRAINING_TABLE_NAME VARCHAR, TARGET_COL VARCHAR) ``` **Arguments (all are required parameters):** `MODELNAME` - Name that will be used to refer to the best model found by Autopilot. Allowed Pattern: `^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}` `TRAINING_TABLE_NAME` - Name of the table from which to create the model. All rows will be considered to train the model. `TARGET_COL` - The name of the target column that we want the model to predict. **Usage:** ``` select aws_autopilot_create_model ('abalonemodel', 'abalone_training_dataset', 'rings') ``` **Expected output on success:** ``` "Model creation in progress. Model ARN = arn:aws:sagemaker:us-west-2:631484165566:automl-job/abalonemodel-job." ``` - The command above kicks off an AutoML job. - The [Problem type](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-problem-types.html) and [Objective metric](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobObjective.html#sagemaker-Type-AutoMLJobObjective-MetricName) are auto inferred. - Depending on the size of the data the model, creation can take anywhere from a few minutes for small data sets to 2-3 hours for large datasets (eg. 5 GB). The default max run time of the AutoML job is 86400 seconds. If you want more control on the model creation time, you can use the advanced `AWS_AUTOPILOT_CREATE_MODEL` option and set the `MAX_RUNNING_TIME` field. **Note:** The parameter is intended to set a timeout on the length of the training job, and if the job has not finished within the specified limit it is forcefully stopped and a model will NOT be created. If you would like to optimize for speed and have a model successfully created in a shorter duration consider using the `MAX_CANDIDATES` parameter. - Use the [AWS_AUTOPILOT_DESCRIBEMODEL](#describe-model) function to check the status of the job. - When the best model is found, Autopilot transparently deploys the model to a SageMaker Endpoint of the same name as the model. - The aws_autopilot_create_model call creates a default endpoint configuration with the name `yourmodelname-m5-4xl-2`, with the following parameters: `"InitialInstanceCount": 2, "InstanceType": "ml.m5.4xlarge"`. Advanced users can go lower or higher depending on their dataset sizes and performance needs. See [AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG](#create-endpoint-config) for more details on specifying a custom endpoint configuration. (In the above example, the name of the endpoint configuration created would be `abalonemodel-m5-4xl-2`.) - Using the above endpoint config, the model will be deployed to an endpoint with the same name as the model (In the above example the endpoint name would be `abalonemodel`. The time to live of the endpoint will be 604800 seconds (7 days), after which it is automatically deleted. - If you would like to redeploy the model after it has been deleted, use the [AWS_AUTOPILOT_CREATE_ENDPOINT](#create-endpoint) command and you can either specify the default endpoint configuration created or specify a custom endpoint configuration. **Note**: See [https://aws.amazon.com/sagemaker/pricing/](https://aws.amazon.com/sagemaker/pricing/) for details on instance pricing and to estimate costs. #### Option 2 Advanced users who would like to specify different default values for the various optional parameters can use this variation of the AWS_AUTOPILOT_CREATE_MODEL call. **Syntax:** ``` AWS_AUTOPILOT_CREATE_MODEL(MODELNAME VARCHAR, TRAINING_TABLE_NAME VARCHAR, TARGET_COL VARCHAR, OBJECTIVE_METRIC VARCHAR, PROBLEM_TYPE VARCHAR,MAX_CANDIDATES INTEGER, MAX_RUNNING_TIME INTEGER, DEPLOY_MODEL BOOLEAN, MODEL_ENDPOINT_TTL INTEGER) ``` **Arguments:** `MODELNAME` (required) - Name that will be used to refer to the best model found by Autopilot. Allowed Pattern: `^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}` `TRAINING_TABLE_NAME` (required) - Name of the table from which to create the model. All rows will be used to train the model. `TARGET_COL` (required) - The name of the target column that we want the model to predict. `OBJECTIVE_METRIC` (optional) - \"Accuracy\", \"MSE\", \"AUC\", \"F1\", and \"F1macro\". If NULL, Autopilot will auto infer this information. `PROBLEM_TYPE` (optional) - Type of problem: \"Regression\", \"BinaryClassification\", \"MulticlassClassification\" or \"Auto\". If NULL the default value will be set to \"Auto\". `MAX_CANDIDATES` (optional) - Maximum number of times a training job is allowed to run. Valid values are integers 1 and higher. Can be leveraged to optimize for speed and have the create model call complete quicker by limiting the number of candidates explored. If NULL, Autopilot will auto infer this information. **Note:** For optimizing for `OBJECTIVE_METRIC` we suggest leaving this field unset, such that the AutoML job can explore all possible candidates and pick the best one. `MAX_RUNNING_TIME` (optional) - Maximum runtime, in seconds, an AutoML job has to complete.If NULL the default value will be set to 86000 seconds. **Note:** The parameter is intended to set a timeout on the length of the training job, and if the job has not finished within the specified limit it is forcefully stopped and a model will NOT be created. If you would like to optimize for speed and have a model successfully created in a shorter duration consider using the `MAX_CANDIDATES` parameter. `DEPLOY_MODEL` (optional) - TRUE or FALSE. If NULL the default value will be TRUE and the best model will be transparently deployed to a SageMaker Endpoint. The default endpoint configuration used is as follows: `"InitialInstanceCount": 2, "InstanceType": "ml.m5.4xlarge"`. Advanced users can go lower or higher depending on their dataset sizes and performance needs. See [AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG](#create-endpoint-config) for more details on specifying a custom endpoint configuration. `MODEL_ENDPOINT_TTL` (optional) - Time to live off the model endpoint in seconds. If NULL the default value will be 7 days. **Note:** See [https://aws.amazon.com/sagemaker/pricing/](https://aws.amazon.com/sagemaker/pricing/) for details on instance pricing and to estimate costs. **Usage:** ``` select aws_autopilot_create_model ('abalonemodel', 'abalone_training_dataset', 'rings', 'Accuracy', 'MulticlassClassification', 20000, 'True', 86400 ) ``` **Note:** External functions do not support optional parameters. For the optional arguments which are wished to be skipped should be specified as a NULL. **Expected output on success:** ``` "Model creation in progress. Model ARN = arn:aws:sagemaker:us-west-2:631484165566:automl-job/abalonemodel-job." ``` ### Describe Model Use the `AWS_AUTOPILOT_DESCRIBE_MODEL` external function in a SQL query to check the status and track progress of your Autopilot training job and the model. **Syntax:** ``` AWS_AUTOPILOT_DESCRIBE_MODEL(MODELNAME VARCHAR) ``` **Arguments:** `MODELNAME` (required) - Name of the model. **Usage:** ``` select aws_autopilot_describe_model ('abalonemodel') ``` **The response includes the following information:** **Job status**: "Completed", "InProgress", "Failed", "Stopped", "Stopping" **Job status detail**: Starting, AnalyzingData, FeatureEngineering, ModelTuning, MaxCandidatesReached, Failed, Stopped, MaxAutoMLJobRuntimeReached, Stopping, DeployingModel, CandidateDefinitionsGenerated **Problem type:** "Regression", "BinaryClassification" or MulticlassClassification". **Objective metric:** "Accuracy", "MSE", "AUC", "F1", and "F1macro". **Best Objective Metric Value:** Value of the objective metric for the best model found so far. **Failure reason:** Returns the reason for failure, if the status was "Failed". ### Predict Outcome Use the `AWS_AUTOPILOT_PREDICT_OUTCOME` external function in a SQL query to make predictions using the ML model produced by Autopilot. **Syntax:** ``` AWS_AUTOPILOT_PREDICT_OUTCOME(MODEL_ENDPOINT_NAME VARCHAR,COLUMNS ARRAY) ``` **Arguments:** `MODEL_ENDPOINT_NAME` (required) - Name of the endpoint the model is deployed to. Note: Unless the model was manually deployed to a custom endpoint this will be the same as the model name. `COLUMNS` (required) - Array of values or feature columns to pass as inputs for model prediction. The ordering should match that of the training dataset, minus the target column. **Usage:** ``` select aws_autopilot_predict_outcome ('abalonemodel', array_construct('M',0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15)); select aws_autopilot_predict_outcome ('abalonemodel', array_construct(sex, length, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight) ) as prediction from abalone_test_dataset; ``` **Response**: Returns the predicted target value for each row of attributes. ### Create Endpoint Config Use the `AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG` external function in a SQL query to create an endpoint configuration that Amazon SageMaker hosting services use to deploy models. This allows advanced users to pick a custom endpoint configuration to go lower or higher depending on their dataset sizes and performance needs compared to the default endpoint configuration used by the create model call. **Syntax:** ``` AWS_AUTOPILOT_CREATE_ENDPOINT_CONFIG(ENDPOINTCONFIG_NAME VARCHAR,MODELNAME VARCHAR,INSTANCE_TYPE VARCHAR,INSTANCE_COUNT NUMBER) ``` **Arguments (all are required parameters):** `ENDPOINT_CONFIG_NAME`- The name of the endpoint configuration. You specify this name in a CreateEndpoint request. Allowed Pattern: `^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}` `MODELNAME` - The name of the model that you want to host. This is the name that you specified when creating the model. `INSTANCE_TYPE` - The ML compute instance type. See [SageMaker instance types](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProductionVariant.html#sagemaker-Type-ProductionVariant-InstanceType) for more details. `INSTANCE_COUNT` - Number of instances to launch. **Usage:** ``` select aws_autopilot_create_endpoint_config ( 'abalone-endpoint-config','abalonemodel', 'ml.c5d.4xlarge', 3) ``` ### Describe Endpoint Config Use the `AWS_AUTOPILOT_DESCRIBE_ENDPOINT_CONFIG` external function in a SQL query to get the description of an endpoint configuration that was created using the Create Endpoint Config call. **Syntax:** ``` AWS_AUTOPILOT_DESCRIBE_ENDPOINT_CONFIG(ENDPOINTCONFIG_NAME) ``` **Arguments (all are required parameters):** `ENDPOINT_CONFIG_NAME`- The name of the endpoint configuration. **Usage:** ``` select aws_autopilot_describe_endpoint_config ('abalone-endpoint-config') ``` **Response**: `ModelName` - The name of the model to be hosted. `InstanceCount` - Number of instances to launch. `InstanceType` - The ML compute instance type. ### Delete Endpoint Config Use the `AWS_AUTOPILOT_DELETE_ENDPOINT_CONFIG` external function in a SQL query to delete an endpoint configuration. This command deletes only the specified configuration. It does not delete endpoints created using the configuration. **Syntax:** ``` AWS_AUTOPILOT_DELETE_ENDPOINT_CONFIG(ENDPOINTCONFIG_NAME) ``` **Arguments (all are required parameters):** `ENDPOINT_CONFIG_NAME`- The name of the endpoint configuration. **Usage:** ``` select aws_autopilot_delete_endpoint_config ('abalone-endpoint-config') ``` ### Create Endpoint Use the `AWS_AUTOPILOT_CREATE_ENDPOINT` external function in a SQL query to create an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models. **Syntax:** ``` AWS_AUTOPILOT_CREATE_ENDPOINT(ENDPOINT_NAME VARCHAR, ENDPOINT_CONFIG_NAME VARCHAR,MODEL_ENDPOINT_TTL INTEGER) ``` **Arguments (all are required parameters):** `ENDPOINT_NAME` - The name of the endpoint. The exact endpoint name must be provided during inference.Allowed Pattern: `^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}` `ENDPOINT_CONFIG_NAME` - The name of the endpoint configuration. **Note:** If you would like to reuse the default endpoint config created during model creation this would be `yourmodelname-m5-4xl-2`. `MODEL_ENDPOINT_TTL` (optional) - Time to live off the model endpoint in seconds. If NULL the default value will be 7 days. **Usage:** ``` select aws_autopilot_create_endpoint ('abalone-endpoint', 'abalone-endpoint-config', 36000) ``` ### Describe Endpoint Use the `AWS_AUTOPILOT_DESCRIBE_ENDPOINT` external function in a SQL query to get the description of an endpoint. **Syntax:** ``` AWS_AUTOPILOT_DESCRIBE_ENDPOINT(ENDPOINT_NAME VARCHAR) ``` **Arguments (all are required parameters):** `ENDPOINT_NAME` - The name of the endpoint. **Usage:** ``` select aws_autopilot_describe_endpoint('abalone-endpoint') ``` **Response:** `CreationTime` - A timestamp that shows when the endpoint was created. `EndpointConfigName` - The name of the endpoint configuration associated with this endpoint. `EndpointStatus` - The status of the endpoint. (Valid values: OutOfService \| Creating \| Updating \| SystemUpdating \| RollingBack \| InService \| Deleting \| Failed) `FailureReason` - If the status of the endpoint is Failed, the reason why it failed. ### Delete Endpoint Use the `AWS_AUTOPILOT_DELETE_ENDPOINT` external function in a SQL query to delete an endpoint. Amazon SageMaker frees up all of the resources that were deployed when the endpoint was created. **Syntax:** ``` AWS_AUTOPILOT_DELETE_ENDPOINT(ENDPOINT_NAME VARCHAR) ``` **Arguments (all are required parameters):** `ENDPOINT_NAME` - The name of the endpoint. **Usage:** ``` select aws_autopilot_delete_endpoint('abalone-endpoint') ``` ## SageMaker Clarify and SageMaker Studio Amazon SageMaker Clarify provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. During the model training process, SageMaker Autopilot automatically creates a notebook (and PDF report) that displays the 10 features with the greatest feature attribution. The notebook is stored in: `/output//documentation/explainability/output/` Additional information about the generated model can be found in Amazon SageMaker Studio. ## Costs There is no additional cost for using the provided Snowflake + Amazon SageMaker Autopilot Integration. You are responsible for: - The cost of the AWS services and Snowflake compute and storage used while running this reference deployment. The AWS CloudFormation template includes configuration parameters that you can customize. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change. **Tip:** After you deploy the template, [create AWS Cost and Usage Reports](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-reports-gettingstarted-turnonreports.html) to track AWS costs associated with the integration. These reports deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. They provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information about the report, see [What are AWS Cost and Usage Reports?](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-reports-costusage.html) ## Cleanup To cleanup the resources created by the integration: - Delete any Sagemaker endpoints that were provisioned while using the integration. You can do this by: - Using the [Delete Endpoint](#delete-endpoint) SQL command from Snowflake or - By opening the Amazon SageMaker console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) and deleting the endpoints. Deleting the endpoints also deletes the ML compute instances that support it. - Under Inference, choose Endpoints. - Choose the endpoint that you created, choose Actions, and then choose Delete. - Delete any Sagemaker endpoint configurations that were provisioned while using the integration. You can do this by: - Using the [Delete Endpoint Config](#delete-endpoint-config) SQL command from Snowflake or - By opening the Amazon SageMaker console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) and: - Under Inference, choose Endpoint configurations. - Choose the endpoint configurations that you created, choose Actions, and then choose Delete. - Delete any Sagemaker Autopilot Models that were created. You can do this by: - By opening the Amazon SageMaker console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) and: - Under Inference, choose Models. - Choose the model that you created. Choose Actions, and then choose Delete. - Log in to the AWS console and navigate to CloudFormation service. Select the stack that was created when you deployed the template and click on Delete. This deletes all the AWS resources provisioned by the template, except the S3 bucket. S3 bucket is not automatically deleted as it might contain training data and outputs from the Autopilot jobs. - To delete the S3 bucket, you need to navigate to the S3 service and manually delete the bucket. For more information see [Deleting a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html). - Clean up the Snowflake resources by logging into the Snowflake console and - Use the [DROP INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/drop-integration.html#drop-integration) SQL command to delete the API and Storage integrations setup. Note: You can use the SQL command `SHOW INTEGRATIONS LIKE '%AWS_AUTOPILOT%'` to see the integrations. - Use the [DROP FUNCTION](https://docs.snowflake.com/en/sql-reference/sql/drop-function.html) SQL command to delete the user defined functions that were set up. Note: You can use the SQL command `SHOW FUNCTIONS LIKE '%AWS_AUTOPILOT%'` to see all the functions.