Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: CC-BY-SA-4.0

Common Parameters for Built-In Algorithms

The following table lists parameters for each of the algorithms provided by Amazon SageMaker.

Algorithm Name Channel Name Training Image and Inference Image Registry Path Training Input Mode File Type Instance Class Parallelizable
BlazingText train <ecr_path>/blazingtext: File or Pipe Text file (one sentence per line with space-separated tokens) GPU (single instance only) or CPU No
DeepAR Forecasting train and (optionally) test <ecr_path>/forecasting-deepar: File JSON Lines or Parquet GPU or CPU Yes
Factorization Machines train and (optionally) test <ecr_path>/factorization-machines: File or Pipe recordIO-protobuf CPU (GPU for dense data) Yes
Image Classification train and validation, (optionally) train_lst, validation_lst, and model <ecr_path>/image-classification: File or Pipe recordIO or image files (.jpg or .png) GPU Yes
IP Insights train and (optionally) validation <ecr_path>/ipinsights: File CSV CPU or GPU Yes
k-means train and (optionally) test <ecr_path>/kmeans: File or Pipe recordIO-protobuf or CSV CPU or GPUCommon (single GPU device on one or more instances) No
k-nearest-neighbor (k-NN) train and (optionally) test <ecr_path>/knn: File or Pipe recordIO-protobuf or CSV CPU or GPU (single GPU device on one or more instances) Yes
LDA train and (optionally) test <ecr_path>/lda: File or Pipe recordIO-protobuf or CSV CPU (single instance only) No
Linear Learner train and (optionally) validation, test, or both <ecr_path>/linear-learner: File or Pipe recordIO-protobuf or CSV CPU or GPU Yes
Neural Topic Model train and (optionally) validation, test, or both <ecr_path>/ntm: File or Pipe recordIO-protobuf or CSV GPU or CPU Yes
Object2Vec train and (optionally) validation, test, or both <ecr_path>/object2vec: File JSON Lines GPU or CPU (single instance only) No
Object Detection train and validation, (optionally) train_annotation, validation_annotation, and model <ecr_path>/object-detection: File or Pipe recordIO or image files (.jpg or .png) GPU Yes
PCA train and (optionally) test <ecr_path>/pca: File or Pipe recordIO-protobuf or CSV GPU or CPU Yes
Random Cut Forest train and (optionally) test <ecr_path>/randomcutforest: File or Pipe recordIO-protobuf or CSV CPU Yes
Semantic Segmentation train and validation, train_annotation, validation_annotation, and (optionally) label_map and model <ecr_path>/semantic-segmentation: File or Pipe image files GPU (single instance only) No
Seq2Seq Modeling train, validation, and vocab <ecr_path>/seq2seq: File recordIO-protobuf GPU (single instance only) No
XGBoost train and (optionally) validation <ecr_path>/xgboost: File CSV or LibSVM CPU Yes

Algorithms that are parallelizable can be deployed on multiple compute instances for distributed training. For the Training Image and Inference Image Registry Path column, use the :1 version tag to ensure that you are using a stable version of the algorithm. You can reliably host a model trained using an image with the :1 tag on an inference image that has the :1 tag. Using the :latest tag in the registry path provides you with the most up-to-date version of the algorithm, but might cause problems with backward compatibility. Avoid using the :latest tag for production purposes.

For the Training Image and Inference Image Registry Path column, depending on algorithm and region use one of the following values for <ecr_path>.

[See the AWS documentation website for more details]

Use the paths and training input mode as follows: + To create a training job (with a request to the CreateTrainingJob API), specify the Docker Registry path and the training input mode for the training image. You create a training job to train a model using a specific dataset.

  + To create a model (with a CreateModel request), specify the Docker Registry path for the inference image. Amazon SageMaker launches machine learning compute instances that are based on the endpoint configuration and deploys the model, which includes the artifacts (the result of model training).