Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: CC-BY-SA-4.0
The following table lists parameters for each of the algorithms provided by Amazon SageMaker.
Algorithm Name | Channel Name | Training Image and Inference Image Registry Path | Training Input Mode | File Type | Instance Class | Parallelizable |
---|---|---|---|---|---|---|
BlazingText | train | <ecr_path>/blazingtext: |
File or Pipe | Text file (one sentence per line with space-separated tokens) | GPU (single instance only) or CPU | No |
DeepAR Forecasting | train and (optionally) test | <ecr_path>/forecasting-deepar: |
File | JSON Lines or Parquet | GPU or CPU | Yes |
Factorization Machines | train and (optionally) test | <ecr_path>/factorization-machines: |
File or Pipe | recordIO-protobuf | CPU (GPU for dense data) | Yes |
Image Classification | train and validation, (optionally) train_lst, validation_lst, and model | <ecr_path>/image-classification: |
File or Pipe | recordIO or image files (.jpg or .png) | GPU | Yes |
IP Insights | train and (optionally) validation | <ecr_path>/ipinsights: |
File | CSV | CPU or GPU | Yes |
k-means | train and (optionally) test | <ecr_path>/kmeans: |
File or Pipe | recordIO-protobuf or CSV | CPU or GPUCommon (single GPU device on one or more instances) | No |
k-nearest-neighbor (k-NN) | train and (optionally) test | <ecr_path>/knn: |
File or Pipe | recordIO-protobuf or CSV | CPU or GPU (single GPU device on one or more instances) | Yes |
LDA | train and (optionally) test | <ecr_path>/lda: |
File or Pipe | recordIO-protobuf or CSV | CPU (single instance only) | No |
Linear Learner | train and (optionally) validation, test, or both | <ecr_path>/linear-learner: |
File or Pipe | recordIO-protobuf or CSV | CPU or GPU | Yes |
Neural Topic Model | train and (optionally) validation, test, or both | <ecr_path>/ntm: |
File or Pipe | recordIO-protobuf or CSV | GPU or CPU | Yes |
Object2Vec | train and (optionally) validation, test, or both | <ecr_path>/object2vec: |
File | JSON Lines | GPU or CPU (single instance only) | No |
Object Detection | train and validation, (optionally) train_annotation, validation_annotation, and model | <ecr_path>/object-detection: |
File or Pipe | recordIO or image files (.jpg or .png) | GPU | Yes |
PCA | train and (optionally) test | <ecr_path>/pca: |
File or Pipe | recordIO-protobuf or CSV | GPU or CPU | Yes |
Random Cut Forest | train and (optionally) test | <ecr_path>/randomcutforest: |
File or Pipe | recordIO-protobuf or CSV | CPU | Yes |
Semantic Segmentation | train and validation, train_annotation, validation_annotation, and (optionally) label_map and model | <ecr_path>/semantic-segmentation: |
File or Pipe | image files | GPU (single instance only) | No |
Seq2Seq Modeling | train, validation, and vocab | <ecr_path>/seq2seq: |
File | recordIO-protobuf | GPU (single instance only) | No |
XGBoost | train and (optionally) validation | <ecr_path>/xgboost: |
File | CSV or LibSVM | CPU | Yes |
Algorithms that are parallelizable can be deployed on multiple compute instances for distributed training. For the Training Image and Inference Image Registry Path column, use the :1
version tag to ensure that you are using a stable version of the algorithm. You can reliably host a model trained using an image with the :1
tag on an inference image that has the :1
tag. Using the :latest
tag in the registry path provides you with the most up-to-date version of the algorithm, but might cause problems with backward compatibility. Avoid using the :latest
tag for production purposes.
For the Training Image and Inference Image Registry Path column, depending on algorithm and region use one of the following values for <ecr_path>.
[See the AWS documentation website for more details]
Use the paths and training input mode as follows: + To create a training job (with a request to the CreateTrainingJob API), specify the Docker Registry path and the training input mode for the training image. You create a training job to train a model using a specific dataset.
+ To create a model (with a CreateModel request), specify the Docker Registry path for the inference image. Amazon SageMaker launches machine learning compute instances that are based on the endpoint configuration and deploys the model, which includes the artifacts (the result of model training).