Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: CC-BY-SA-4.0

Sequence-to-Sequence Hyperparameters

Parameter Name Description
batch_size Mini batch size for gradient descent. Optional Valid values: positive integer Default value: 64
beam_size Length of the beam for beam search. Used during training for computing bleu and used during inference. Optional Valid values: positive integer Default value: 5
bleu_sample_size Number of instances to pick from validation dataset to decode and compute bleu score during training. Set to -1 to use full validation set (if bleu is chosen as optimized_metric). Optional Valid values: integer Default value: 0
bucket_width Returns (source,target) buckets up to (max_seq_len_source, max_seq_len_target). The longer side of the data uses steps of bucket_width while the shorter side uses steps scaled down by the average target/source length ratio. If one sided reaches its maximum length before the other, width of extra buckets on that side is fixed to that side of max_len. Optional Valid values: positive integer Default value: 10
bucketing_enabled Set to false to disable bucketing, unroll to maximum length. Optional Valid values: true or false Default value: true
checkpoint_frequency_num_batches Checkpoint and evaluate every x batches. Optional Valid values: positive integer Default value: 1000
checkpoint_threshold Maximum number of checkpoints model is allowed to not improve in optimized_metric on validation dataset before training is stopped. Optional Valid values: positive integer Default value: 3
clip_gradient Clip absolute gradient values greater than this. Set to negative to disable. Optional Valid values: float Default value: 1
cnn_activation_type The cnn activation type to be used. Optional Valid values: String. One of glu, relu, softrelu, sigmoid, or tanh. Default value: glu
cnn_hidden_dropout Dropout probability for dropout between convolutional layers. Optional Valid values: Float. Range in [0,1]. Default value: 0
cnn_kernel_width_decoder Kernel width for the cnn decoder. Optional Valid values: positive integer Default value: 5
cnn_kernel_width_encoder Kernel width for the cnn encoder. Optional Valid values: positive integer Default value: 3
cnn_num_hidden Number of cnn hidden units for encoder and decoder. Optional Valid values: positive integer Default value: 512
decoder_type Decoder type. Optional Valid values: String. Either rnn or cnn. Default value: rnn
embed_dropout_source Dropout probability for source side embeddings. Optional Valid values: Float. Range in [0,1]. Default value: 0
embed_dropout_target Dropout probability for target side embeddings. Optional Valid values: Float. Range in [0,1]. Default value: 0
encoder_type Encoder type. The rnn architecture is based on attention mechanism by Bahdanau et al. and cnn architecture is based on Gehring et al. Optional Valid values: String. Either rnn or cnn. Default value: rnn
fixed_rate_lr_half_life Half life for learning rate in terms of number of checkpoints for fixed_rate_* schedulers. Optional Valid values: positive integer Default value: 10
learning_rate Initial learning rate. Optional Valid values: float Default value: 0.0003
loss_type Loss function for training. Optional Valid values: String. cross-entropy Default value: cross-entropy
lr_scheduler_type Learning rate scheduler type. plateau_reduce means reduce the learning rate whenever optimized_metric on validation_accuracy plateaus. inv_t is inverse time decay. learning_rate/(1+decay_rate*t) Optional Valid values: String. One of plateau_reduce, fixed_rate_inv_t, or fixed_rate_inv_sqrt_t. Default value: plateau_reduce
max_num_batches Maximum number of updates/batches to process. -1 for infinite. Optional Valid values: integer Default value: -1
max_num_epochs Maximum number of epochs to pass through training data before fitting is stopped. Training continues until this number of epochs even if validation accuracy is not improving if this parameter is passed. Ignored if not passed. Optional Valid values: Positive integer and less than or equal to max_num_epochs. Default value: none
max_seq_len_source Maximum length for the source sequence length. Sequences longer than this length are truncated to this length. Optional Valid values: positive integer Default value: 100
max_seq_len_target Maximum length for the target sequence length. Sequences longer than this length are truncated to this length. Optional Valid values: positive integer Default value: 100
min_num_epochs Minimum number of epochs the training must run before it is stopped via early_stopping conditions. Optional Valid values: positive integer Default value: 0
momentum Momentum constant used for sgd. Don’t pass this parameter if you are using adam or rmsprop. Optional Valid values: float Default value: none
num_embed_source Embedding size for source tokens. Optional Valid values: positive integer Default value: 512
num_embed_target Embedding size for target tokens. Optional Valid values: positive integer Default value: 512
num_layers_decoder Number of layers for Decoder rnn or cnn. Optional Valid values: positive integer Default value: 1
num_layers_encoder Number of layers for Encoder rnn or cnn. Optional Valid values: positive integer Default value: 1
optimized_metric Metrics to optimize with early stopping. Optional Valid values: String. One of perplexity, accuracy, or bleu. Default value: perplexity
optimizer_type Optimizer to choose from. Optional Valid values: String. One of adam, sgd, or rmsprop. Default value: adam
plateau_reduce_lr_factor Factor to multiply learning rate with (for plateau_reduce). Optional Valid values: float Default value: 0.5
plateau_reduce_lr_threshold For plateau_reduce scheduler, multiply learning rate with reduce factor if optimized_metric didn’t improve for this many checkpoints. Optional Valid values: positive integer Default value: 3
rnn_attention_in_upper_layers Pass the attention to upper layers of rnn, like Google NMT paper. Only applicable if more than one layer is used. Optional Valid values: boolean (true or false) Default value: true
rnn_attention_num_hidden Number of hidden units for attention layers. defaults to rnn_num_hidden. Optional Valid values: positive integer Default value: rnn_num_hidden
rnn_attention_type Attention model for encoders. mlp refers to concat and bilinear refers to general from the Luong et al. paper. Optional Valid values: String. One of dot, fixed, mlp, or bilinear. Default value: mlp
rnn_cell_type Specific type of rnn architecture. Optional Valid values: String. Either lstm or gru. Default value: lstm
rnn_decoder_state_init How to initialize rnn decoder states from encoders. Optional Valid values: String. One of last, avg, or zero. Default value: last
rnn_first_residual_layer First rnn layer to have a residual connection, only applicable if number of layers in encoder or decoder is more than 1. Optional Valid values: positive integer Default value: 2
rnn_num_hidden The number of rnn hidden units for encoder and decoder. This must be a multiple of 2 because the algorithm uses bi-directional Long Term Short Term Memory (LSTM) by default. Optional Valid values: positive even integer Default value: 1024
rnn_residual_connections Add residual connection to stacked rnn. Number of layers should be more than 1. Optional Valid values: boolean (true or false) Default value: false
rnn_decoder_hidden_dropout Dropout probability for hidden state that combines the context with the rnn hidden state in the decoder. Optional Valid values: Float. Range in [0,1]. Default value: 0
training_metric Metrics to track on training on validation data. Optional Valid values: String. Either perplexity or accuracy. Default value: perplexity
weight_decay Weight decay constant. Optional Valid values: float Default value: 0
weight_init_scale Weight initialization scale (for uniform and xavier initialization). Optional Valid values: float Default value: 2.34
weight_init_type Type of weight initialization. Optional Valid values: String. Either uniform or xavier. Default value: xavier
xavier_factor_type Xavier factor type. Optional Valid values: String. One of in, out, or avg. Default value: in