Factorization Machines Hyperparameters

The following table contains the hyperparameters for the factorization machines algorithm. These are parameters that are set by users to facilitate the estimation of model parameters from data. The required hyperparameters that must be set are listed first, in alphabetical order. The optional hyperparameters that can be set are listed next, also in alphabetical order.

Parameter Name	Description
feature_dim	The dimension of the input feature space. This could be very high with sparse input. Required Valid values: Positive integer. Suggested value range: [10000,10000000]
num_factors	The dimensionality of factorization. Required Valid values: Positive integer. Suggested value range: [2,1000], 64 is usually optimal.
predictor_type	The type of predictor. [See the AWS documentation website for more details] Required Valid values: String: `binary_classifier` or `regressor`
bias_init_method	The initialization method for the bias term: [See the AWS documentation website for more details] Optional Valid values: `uniform`, `normal`, or `constant` Default value: `normal`
bias_init_scale	Range for initialization of the bias term. Takes effect if `bias_init_method` is set to `uniform`. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: None
bias_init_sigma	The standard deviation for initialization of the bias term. Takes effect if `bias_init_method` is set to `normal`. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.01
bias_init_value	The initial value of the bias term. Takes effect if `bias_init_method` is set to `constant`. Optional Valid values: Float. Suggested value range: [1e-8, 512]. Default value: None
bias_lr	The learning rate for the bias term. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.1
bias_wd	The weight decay for the bias term. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.01
clip_gradient	Gradient clipping optimizer parameter. Clips the gradient by projecting onto the interval [-`clip_gradient`, +`clip_gradient`]. Optional Valid values: Float Default value: None
epochs	The number of training epochs to run. Optional Valid values: Positive integer Default value: 1
eps	Epsilon parameter to avoid division by 0. Optional Valid values: Float. Suggested value: small. Default value: None
factors_init_method	The initialization method for factorization terms: [See the AWS documentation website for more details] Optional Valid values: `uniform`, `normal`, or `constant`. Default value: `normal`
factors_init_scale	The range for initialization of factorization terms. Takes effect if `factors_init_method` is set to `uniform`. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: None
factors_init_sigma	The standard deviation for initialization of factorization terms. Takes effect if `factors_init_method` is set to `normal`. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.001
factors_init_value	The initial value of factorization terms. Takes effect if `factors_init_method` is set to `constant`. Optional Valid values: Float. Suggested value range: [1e-8, 512]. Default value: None
factors_lr	The learning rate for factorization terms. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.0001
factors_wd	The weight decay for factorization terms. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.00001
linear_lr	The learning rate for linear terms. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.001
linear_init_method	The initialization method for linear terms: [See the AWS documentation website for more details] Optional Valid values: `uniform`, `normal`, or `constant`. Default value: `normal`
linear_init_scale	Range for initialization of linear terms. Takes effect if `linear_init_method` is set to `uniform`. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: None
linear_init_sigma	The standard deviation for initialization of linear terms. Takes effect if `linear_init_method` is set to `normal`. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.01
linear_init_value	The initial value of linear terms. Takes effect if `linear_init_method` is set to constant. Optional Valid values: Float. Suggested value range: [1e-8, 512]. Default value: None
linear_wd	The weight decay for linear terms. Optional Valid values: Non-negative float. Suggested value range: [1e-8, 512]. Default value: 0.001
mini_batch_size	The size of mini-batch used for training. Optional Valid values: Positive integer Default value: 1000
rescale_grad	Gradient rescaling optimizer parameter. If set, multiplies the gradient with `rescale_grad` before updating. Often choose to be 1.0/`batch_size`. Optional Valid values: Float Default value: None