Sequence-to-Sequence Hyperparameters

Parameter Name	Description
batch_size	Mini batch size for gradient descent. Optional Valid values: positive integer Default value: 64
beam_size	Length of the beam for beam search. Used during training for computing `bleu` and used during inference. Optional Valid values: positive integer Default value: 5
bleu_sample_size	Number of instances to pick from validation dataset to decode and compute `bleu` score during training. Set to -1 to use full validation set (if `bleu` is chosen as `optimized_metric`). Optional Valid values: integer Default value: 0
bucket_width	Returns (source,target) buckets up to (`max_seq_len_source`, `max_seq_len_target`). The longer side of the data uses steps of `bucket_width` while the shorter side uses steps scaled down by the average target/source length ratio. If one sided reaches its maximum length before the other, width of extra buckets on that side is fixed to that side of `max_len`. Optional Valid values: positive integer Default value: 10
bucketing_enabled	Set to `false` to disable bucketing, unroll to maximum length. Optional Valid values: `true` or `false` Default value: `true`
checkpoint_frequency_num_batches	Checkpoint and evaluate every x batches. Optional Valid values: positive integer Default value: 1000
checkpoint_threshold	Maximum number of checkpoints model is allowed to not improve in `optimized_metric` on validation dataset before training is stopped. Optional Valid values: positive integer Default value: 3
clip_gradient	Clip absolute gradient values greater than this. Set to negative to disable. Optional Valid values: float Default value: 1
cnn_activation_type	The `cnn` activation type to be used. Optional Valid values: String. One of `glu`, `relu`, `softrelu`, `sigmoid`, or `tanh`. Default value: `glu`
cnn_hidden_dropout	Dropout probability for dropout between convolutional layers. Optional Valid values: Float. Range in [0,1]. Default value: 0
cnn_kernel_width_decoder	Kernel width for the `cnn` decoder. Optional Valid values: positive integer Default value: 5
cnn_kernel_width_encoder	Kernel width for the `cnn` encoder. Optional Valid values: positive integer Default value: 3
cnn_num_hidden	Number of `cnn` hidden units for encoder and decoder. Optional Valid values: positive integer Default value: 512
decoder_type	Decoder type. Optional Valid values: String. Either `rnn` or `cnn`. Default value: rnn
embed_dropout_source	Dropout probability for source side embeddings. Optional Valid values: Float. Range in [0,1]. Default value: 0
embed_dropout_target	Dropout probability for target side embeddings. Optional Valid values: Float. Range in [0,1]. Default value: 0
encoder_type	Encoder type. The `rnn` architecture is based on attention mechanism by Bahdanau et al. and cnn architecture is based on Gehring et al. Optional Valid values: String. Either `rnn` or `cnn`. Default value: `rnn`
fixed_rate_lr_half_life	Half life for learning rate in terms of number of checkpoints for `fixed_rate_`* schedulers. Optional Valid values: positive integer Default value: 10
learning_rate	Initial learning rate. Optional Valid values: float Default value: 0.0003
loss_type	Loss function for training. Optional Valid values: String. `cross-entropy` Default value: `cross-entropy`
lr_scheduler_type	Learning rate scheduler type. `plateau_reduce` means reduce the learning rate whenever `optimized_metric` on `validation_accuracy` plateaus. `inv_t` is inverse time decay. `learning_rate`/(1+`decay_rate`t) Optional* Valid values: String. One of `plateau_reduce`, `fixed_rate_inv_t`, or `fixed_rate_inv_sqrt_t`. Default value: `plateau_reduce`
max_num_batches	Maximum number of updates/batches to process. -1 for infinite. Optional Valid values: integer Default value: -1
max_num_epochs	Maximum number of epochs to pass through training data before fitting is stopped. Training continues until this number of epochs even if validation accuracy is not improving if this parameter is passed. Ignored if not passed. Optional Valid values: Positive integer and less than or equal to max_num_epochs. Default value: none
max_seq_len_source	Maximum length for the source sequence length. Sequences longer than this length are truncated to this length. Optional Valid values: positive integer Default value: 100
max_seq_len_target	Maximum length for the target sequence length. Sequences longer than this length are truncated to this length. Optional Valid values: positive integer Default value: 100
min_num_epochs	Minimum number of epochs the training must run before it is stopped via `early_stopping` conditions. Optional Valid values: positive integer Default value: 0
momentum	Momentum constant used for `sgd`. Don’t pass this parameter if you are using `adam` or `rmsprop`. Optional Valid values: float Default value: none
num_embed_source	Embedding size for source tokens. Optional Valid values: positive integer Default value: 512
num_embed_target	Embedding size for target tokens. Optional Valid values: positive integer Default value: 512
num_layers_decoder	Number of layers for Decoder rnn or cnn. Optional Valid values: positive integer Default value: 1
num_layers_encoder	Number of layers for Encoder `rnn` or `cnn`. Optional Valid values: positive integer Default value: 1
optimized_metric	Metrics to optimize with early stopping. Optional Valid values: String. One of `perplexity`, `accuracy`, or `bleu`. Default value: `perplexity`
optimizer_type	Optimizer to choose from. Optional Valid values: String. One of `adam`, `sgd`, or `rmsprop`. Default value: `adam`
plateau_reduce_lr_factor	Factor to multiply learning rate with (for `plateau_reduce`). Optional Valid values: float Default value: 0.5
plateau_reduce_lr_threshold	For `plateau_reduce` scheduler, multiply learning rate with reduce factor if `optimized_metric` didn’t improve for this many checkpoints. Optional Valid values: positive integer Default value: 3
rnn_attention_in_upper_layers	Pass the attention to upper layers of rnn, like Google NMT paper. Only applicable if more than one layer is used. Optional Valid values: boolean (`true` or `false`) Default value: `true`
rnn_attention_num_hidden	Number of hidden units for attention layers. defaults to `rnn_num_hidden`. Optional Valid values: positive integer Default value: `rnn_num_hidden`
rnn_attention_type	Attention model for encoders. `mlp` refers to concat and bilinear refers to general from the Luong et al. paper. Optional Valid values: String. One of `dot`, `fixed`, `mlp`, or `bilinear`. Default value: `mlp`
rnn_cell_type	Specific type of `rnn` architecture. Optional Valid values: String. Either `lstm` or `gru`. Default value: `lstm`
rnn_decoder_state_init	How to initialize `rnn` decoder states from encoders. Optional Valid values: String. One of `last`, `avg`, or `zero`. Default value: `last`
rnn_first_residual_layer	First rnn layer to have a residual connection, only applicable if number of layers in encoder or decoder is more than 1. Optional Valid values: positive integer Default value: 2
rnn_num_hidden	The number of rnn hidden units for encoder and decoder. This must be a multiple of 2 because the algorithm uses bi-directional Long Term Short Term Memory (LSTM) by default. Optional Valid values: positive even integer Default value: 1024
rnn_residual_connections	Add residual connection to stacked rnn. Number of layers should be more than 1. Optional Valid values: boolean (`true` or `false`) Default value: `false`
rnn_decoder_hidden_dropout	Dropout probability for hidden state that combines the context with the rnn hidden state in the decoder. Optional Valid values: Float. Range in [0,1]. Default value: 0
training_metric	Metrics to track on training on validation data. Optional Valid values: String. Either `perplexity` or `accuracy`. Default value: `perplexity`
weight_decay	Weight decay constant. Optional Valid values: float Default value: 0
weight_init_scale	Weight initialization scale (for `uniform` and `xavier` initialization). Optional Valid values: float Default value: 2.34
weight_init_type	Type of weight initialization. Optional Valid values: String. Either `uniform` or `xavier`. Default value: `xavier`
xavier_factor_type	Xavier factor type. Optional Valid values: String. One of `in`, `out`, or `avg`. Default value: `in`