Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: CC-BY-SA-4.0

LDA Hyperparameters

In the CreateTrainingJob request, you specify the training algorithm. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the LDA training algorithm provided by Amazon SageMaker. For more information, see How LDA Works.

Parameter Name Description
num_topics The number of topics for LDA to find within the data. Required Valid values: positive integer
feature_dim The size of the vocabulary of the input document corpus. Required Valid values: positive integer
mini_batch_size The total number of documents in the input document corpus. Required Valid values: positive integer
alpha0 Initial guess for the concentration parameter: the sum of the elements of the Dirichlet prior. Small values are more likely to generate sparse topic mixtures and large values (greater than 1.0) produce more uniform mixtures. Optional Valid values: Positive float Default value: 0.1
max_restarts The number of restarts to perform during the Alternating Least Squares (ALS) spectral decomposition phase of the algorithm. Can be used to find better quality local minima at the expense of additional computation, but typically should not be adjusted. Optional Valid values: Positive integer Default value: 10
max_iterations The maximum number of iterations to perform during the ALS phase of the algorithm. Can be used to find better quality minima at the expense of additional computation, but typically should not be adjusted. Optional Valid values: Positive integer Default value: 1000
tol Target error tolerance for the ALS phase of the algorithm. Can be used to find better quality minima at the expense of additional computation, but typically should not be adjusted. Optional Valid values: Positive float Default value: 1e-8