/* * Copyright 2018-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. * * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with * the License. A copy of the License is located at * * http://aws.amazon.com/apache2.0 * * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR * CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions * and limitations under the License. */ package com.amazonaws.services.machinelearning.model; import java.io.Serializable; import javax.annotation.Generated; import com.amazonaws.protocol.StructuredPojo; import com.amazonaws.protocol.ProtocolMarshaller; /** *
* The data specification of an Amazon Relational Database Service (Amazon RDS) DataSource
.
*
* Describes the DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
*
* The query that is used to retrieve the observation data for the DataSource
.
*
* The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database. *
*/ private RDSDatabaseCredentials databaseCredentials; /** *
* The Amazon S3 location for staging Amazon RDS data. The data retrieved from Amazon RDS using
* SelectSqlQuery
is stored in this location.
*
* A JSON string that represents the splitting and rearrangement processing to be applied to a
* DataSource
. If the DataRearrangement
parameter is not provided, all of the input data
* is used to create the Datasource
.
*
* There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the Datasource.
* If you do not include percentBegin
and percentEnd
, Amazon ML includes all of the data
* when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource. If you do
* not include percentBegin
and percentEnd
, Amazon ML includes all of the data when
* creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the range of
* percentBegin
to percentEnd
to create a datasource. The complement
* parameter is useful if you need to create complementary datasources for training and evaluation. To create a
* complementary datasource, use the same values for percentBegin
and percentEnd
, along
* with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a model. * The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training: {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon ML
* takes all of the data records between the percentBegin
and percentEnd
parameters for
* the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd parameters,
* set the strategy
parameter to random
and provide a string that is used as the seed
* value for the random data splitting (for example, you can use the S3 path to your data as the random seed
* string). If you choose the random split strategy, Amazon ML assigns each row of data a pseudo-random number
* between 0 and 100, and then selects the rows that have an assigned number between percentBegin
and
* percentEnd
. Pseudo-random numbers are assigned using both the input seed string value and the byte
* offset as a seed, so changing the data results in a different split. Any existing ordering is preserved. The
* random splitting strategy ensures that variables in the training and evaluation data are distributed similarly.
* It is useful in the cases where the input data may have an implicit sort order, which would otherwise result in
* training and evaluation datasources containing non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
* A JSON string that represents the schema for an Amazon RDS DataSource
. The DataSchema
* defines the structure of the observation data in the data file(s) referenced in the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following format to
* define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", * "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": * "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": * "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } *
*/ private String dataSchema; /** *
* The Amazon S3 location of the DataSchema
.
*
* The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) instance to * carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see Role templates for * data pipelines. *
*/ private String resourceRole; /** ** The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the copy task * from Amazon RDS to Amazon S3. For more information, see Role templates for * data pipelines. *
*/ private String serviceRole; /** ** The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to carry * out the copy task from Amazon RDS to Amazon S3. *
*/ private String subnetId; /** ** The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to carry out * the copy operation from Amazon RDS to an Amazon S3 task. *
*/ private com.amazonaws.internal.SdkInternalList
* Describes the DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
*
DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
*/
public void setDatabaseInformation(RDSDatabase databaseInformation) {
this.databaseInformation = databaseInformation;
}
/**
*
* Describes the DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
*
DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
*/
public RDSDatabase getDatabaseInformation() {
return this.databaseInformation;
}
/**
*
* Describes the DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
*
DatabaseName
and InstanceIdentifier
of an Amazon RDS database.
* @return Returns a reference to this object so that method calls can be chained together.
*/
public RDSDataSpec withDatabaseInformation(RDSDatabase databaseInformation) {
setDatabaseInformation(databaseInformation);
return this;
}
/**
*
* The query that is used to retrieve the observation data for the DataSource
.
*
DataSource
.
*/
public void setSelectSqlQuery(String selectSqlQuery) {
this.selectSqlQuery = selectSqlQuery;
}
/**
*
* The query that is used to retrieve the observation data for the DataSource
.
*
DataSource
.
*/
public String getSelectSqlQuery() {
return this.selectSqlQuery;
}
/**
*
* The query that is used to retrieve the observation data for the DataSource
.
*
DataSource
.
* @return Returns a reference to this object so that method calls can be chained together.
*/
public RDSDataSpec withSelectSqlQuery(String selectSqlQuery) {
setSelectSqlQuery(selectSqlQuery);
return this;
}
/**
* * The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database. *
* * @param databaseCredentials * The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database. */ public void setDatabaseCredentials(RDSDatabaseCredentials databaseCredentials) { this.databaseCredentials = databaseCredentials; } /** ** The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database. *
* * @return The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS * database. */ public RDSDatabaseCredentials getDatabaseCredentials() { return this.databaseCredentials; } /** ** The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database. *
* * @param databaseCredentials * The AWS Identity and Access Management (IAM) credentials that are used connect to the Amazon RDS database. * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withDatabaseCredentials(RDSDatabaseCredentials databaseCredentials) { setDatabaseCredentials(databaseCredentials); return this; } /** *
* The Amazon S3 location for staging Amazon RDS data. The data retrieved from Amazon RDS using
* SelectSqlQuery
is stored in this location.
*
SelectSqlQuery
is stored in this location.
*/
public void setS3StagingLocation(String s3StagingLocation) {
this.s3StagingLocation = s3StagingLocation;
}
/**
*
* The Amazon S3 location for staging Amazon RDS data. The data retrieved from Amazon RDS using
* SelectSqlQuery
is stored in this location.
*
SelectSqlQuery
is stored in this location.
*/
public String getS3StagingLocation() {
return this.s3StagingLocation;
}
/**
*
* The Amazon S3 location for staging Amazon RDS data. The data retrieved from Amazon RDS using
* SelectSqlQuery
is stored in this location.
*
SelectSqlQuery
is stored in this location.
* @return Returns a reference to this object so that method calls can be chained together.
*/
public RDSDataSpec withS3StagingLocation(String s3StagingLocation) {
setS3StagingLocation(s3StagingLocation);
return this;
}
/**
*
* A JSON string that represents the splitting and rearrangement processing to be applied to a
* DataSource
. If the DataRearrangement
parameter is not provided, all of the input data
* is used to create the Datasource
.
*
* There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the Datasource.
* If you do not include percentBegin
and percentEnd
, Amazon ML includes all of the data
* when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource. If you do
* not include percentBegin
and percentEnd
, Amazon ML includes all of the data when
* creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the range of
* percentBegin
to percentEnd
to create a datasource. The complement
* parameter is useful if you need to create complementary datasources for training and evaluation. To create a
* complementary datasource, use the same values for percentBegin
and percentEnd
, along
* with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a model. * The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training: {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon ML
* takes all of the data records between the percentBegin
and percentEnd
parameters for
* the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd parameters,
* set the strategy
parameter to random
and provide a string that is used as the seed
* value for the random data splitting (for example, you can use the S3 path to your data as the random seed
* string). If you choose the random split strategy, Amazon ML assigns each row of data a pseudo-random number
* between 0 and 100, and then selects the rows that have an assigned number between percentBegin
and
* percentEnd
. Pseudo-random numbers are assigned using both the input seed string value and the byte
* offset as a seed, so changing the data results in a different split. Any existing ordering is preserved. The
* random splitting strategy ensures that variables in the training and evaluation data are distributed similarly.
* It is useful in the cases where the input data may have an implicit sort order, which would otherwise result in
* training and evaluation datasources containing non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
DataSource
. If the DataRearrangement
parameter is not provided, all of the input
* data is used to create the Datasource
.
* * There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the
* Datasource. If you do not include percentBegin
and percentEnd
, Amazon ML
* includes all of the data when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource. If
* you do not include percentBegin
and percentEnd
, Amazon ML includes all of the
* data when creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the
* range of percentBegin
to percentEnd
to create a datasource. The
* complement
parameter is useful if you need to create complementary datasources for training
* and evaluation. To create a complementary datasource, use the same values for percentBegin
* and percentEnd
, along with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a * model. The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training:
* {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon
* ML takes all of the data records between the percentBegin
and percentEnd
* parameters for the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd
* parameters, set the strategy
parameter to random
and provide a string that is
* used as the seed value for the random data splitting (for example, you can use the S3 path to your data as
* the random seed string). If you choose the random split strategy, Amazon ML assigns each row of data a
* pseudo-random number between 0 and 100, and then selects the rows that have an assigned number between
* percentBegin
and percentEnd
. Pseudo-random numbers are assigned using both the
* input seed string value and the byte offset as a seed, so changing the data results in a different split.
* Any existing ordering is preserved. The random splitting strategy ensures that variables in the training
* and evaluation data are distributed similarly. It is useful in the cases where the input data may have an
* implicit sort order, which would otherwise result in training and evaluation datasources containing
* non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training
* and evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
* A JSON string that represents the splitting and rearrangement processing to be applied to a
* DataSource
. If the DataRearrangement
parameter is not provided, all of the input data
* is used to create the Datasource
.
*
* There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the Datasource.
* If you do not include percentBegin
and percentEnd
, Amazon ML includes all of the data
* when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource. If you do
* not include percentBegin
and percentEnd
, Amazon ML includes all of the data when
* creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the range of
* percentBegin
to percentEnd
to create a datasource. The complement
* parameter is useful if you need to create complementary datasources for training and evaluation. To create a
* complementary datasource, use the same values for percentBegin
and percentEnd
, along
* with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a model. * The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training: {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon ML
* takes all of the data records between the percentBegin
and percentEnd
parameters for
* the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd parameters,
* set the strategy
parameter to random
and provide a string that is used as the seed
* value for the random data splitting (for example, you can use the S3 path to your data as the random seed
* string). If you choose the random split strategy, Amazon ML assigns each row of data a pseudo-random number
* between 0 and 100, and then selects the rows that have an assigned number between percentBegin
and
* percentEnd
. Pseudo-random numbers are assigned using both the input seed string value and the byte
* offset as a seed, so changing the data results in a different split. Any existing ordering is preserved. The
* random splitting strategy ensures that variables in the training and evaluation data are distributed similarly.
* It is useful in the cases where the input data may have an implicit sort order, which would otherwise result in
* training and evaluation datasources containing non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
DataSource
. If the DataRearrangement
parameter is not provided, all of the
* input data is used to create the Datasource
.
* * There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the
* Datasource. If you do not include percentBegin
and percentEnd
, Amazon ML
* includes all of the data when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource.
* If you do not include percentBegin
and percentEnd
, Amazon ML includes all of
* the data when creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the
* range of percentBegin
to percentEnd
to create a datasource. The
* complement
parameter is useful if you need to create complementary datasources for training
* and evaluation. To create a complementary datasource, use the same values for percentBegin
* and percentEnd
, along with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a * model. The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training:
* {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon
* ML takes all of the data records between the percentBegin
and percentEnd
* parameters for the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd
* parameters, set the strategy
parameter to random
and provide a string that is
* used as the seed value for the random data splitting (for example, you can use the S3 path to your data
* as the random seed string). If you choose the random split strategy, Amazon ML assigns each row of data a
* pseudo-random number between 0 and 100, and then selects the rows that have an assigned number between
* percentBegin
and percentEnd
. Pseudo-random numbers are assigned using both the
* input seed string value and the byte offset as a seed, so changing the data results in a different split.
* Any existing ordering is preserved. The random splitting strategy ensures that variables in the training
* and evaluation data are distributed similarly. It is useful in the cases where the input data may have an
* implicit sort order, which would otherwise result in training and evaluation datasources containing
* non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training
* and evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
* A JSON string that represents the splitting and rearrangement processing to be applied to a
* DataSource
. If the DataRearrangement
parameter is not provided, all of the input data
* is used to create the Datasource
.
*
* There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the Datasource.
* If you do not include percentBegin
and percentEnd
, Amazon ML includes all of the data
* when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource. If you do
* not include percentBegin
and percentEnd
, Amazon ML includes all of the data when
* creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the range of
* percentBegin
to percentEnd
to create a datasource. The complement
* parameter is useful if you need to create complementary datasources for training and evaluation. To create a
* complementary datasource, use the same values for percentBegin
and percentEnd
, along
* with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a model. * The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training: {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon ML
* takes all of the data records between the percentBegin
and percentEnd
parameters for
* the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd parameters,
* set the strategy
parameter to random
and provide a string that is used as the seed
* value for the random data splitting (for example, you can use the S3 path to your data as the random seed
* string). If you choose the random split strategy, Amazon ML assigns each row of data a pseudo-random number
* between 0 and 100, and then selects the rows that have an assigned number between percentBegin
and
* percentEnd
. Pseudo-random numbers are assigned using both the input seed string value and the byte
* offset as a seed, so changing the data results in a different split. Any existing ordering is preserved. The
* random splitting strategy ensures that variables in the training and evaluation data are distributed similarly.
* It is useful in the cases where the input data may have an implicit sort order, which would otherwise result in
* training and evaluation datasources containing non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
DataSource
. If the DataRearrangement
parameter is not provided, all of the input
* data is used to create the Datasource
.
* * There are multiple parameters that control what data is used to create a datasource: *
*
* percentBegin
*
* Use percentBegin
to indicate the beginning of the range of the data used to create the
* Datasource. If you do not include percentBegin
and percentEnd
, Amazon ML
* includes all of the data when creating the datasource.
*
* percentEnd
*
* Use percentEnd
to indicate the end of the range of the data used to create the Datasource. If
* you do not include percentBegin
and percentEnd
, Amazon ML includes all of the
* data when creating the datasource.
*
* complement
*
* The complement
parameter instructs Amazon ML to use the data that is not included in the
* range of percentBegin
to percentEnd
to create a datasource. The
* complement
parameter is useful if you need to create complementary datasources for training
* and evaluation. To create a complementary datasource, use the same values for percentBegin
* and percentEnd
, along with the complement
parameter.
*
* For example, the following two datasources do not share any data, and can be used to train and evaluate a * model. The first datasource has 25 percent of the data, and the second one has 75 percent of the data. *
*
* Datasource for evaluation: {"splitting":{"percentBegin":0, "percentEnd":25}}
*
* Datasource for training:
* {"splitting":{"percentBegin":0, "percentEnd":25, "complement":"true"}}
*
* strategy
*
* To change how Amazon ML splits the data for a datasource, use the strategy
parameter.
*
* The default value for the strategy
parameter is sequential
, meaning that Amazon
* ML takes all of the data records between the percentBegin
and percentEnd
* parameters for the datasource, in the order that the records appear in the input data.
*
* The following two DataRearrangement
lines are examples of sequentially ordered training and
* evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"sequential", "complement":"true"}}
*
* To randomly split the input data into the proportions indicated by the percentBegin and percentEnd
* parameters, set the strategy
parameter to random
and provide a string that is
* used as the seed value for the random data splitting (for example, you can use the S3 path to your data as
* the random seed string). If you choose the random split strategy, Amazon ML assigns each row of data a
* pseudo-random number between 0 and 100, and then selects the rows that have an assigned number between
* percentBegin
and percentEnd
. Pseudo-random numbers are assigned using both the
* input seed string value and the byte offset as a seed, so changing the data results in a different split.
* Any existing ordering is preserved. The random splitting strategy ensures that variables in the training
* and evaluation data are distributed similarly. It is useful in the cases where the input data may have an
* implicit sort order, which would otherwise result in training and evaluation datasources containing
* non-similar data records.
*
* The following two DataRearrangement
lines are examples of non-sequentially ordered training
* and evaluation datasources:
*
* Datasource for evaluation:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv"}}
*
* Datasource for training:
* {"splitting":{"percentBegin":70, "percentEnd":100, "strategy":"random", "randomSeed"="s3://my_s3_path/bucket/file.csv", "complement":"true"}}
*
* A JSON string that represents the schema for an Amazon RDS DataSource
. The DataSchema
* defines the structure of the observation data in the data file(s) referenced in the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following format to
* define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", * "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": * "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": * "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } *
* * @param dataSchema * A JSON string that represents the schema for an Amazon RDSDataSource
. The
* DataSchema
defines the structure of the observation data in the data file(s) referenced in
* the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following
* format to define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": * "F3", "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", * "fieldType": "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", * "fieldType": "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } */ public void setDataSchema(String dataSchema) { this.dataSchema = dataSchema; } /** *
* A JSON string that represents the schema for an Amazon RDS DataSource
. The DataSchema
* defines the structure of the observation data in the data file(s) referenced in the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following format to
* define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", * "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": * "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": * "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } *
* * @return A JSON string that represents the schema for an Amazon RDSDataSource
. The
* DataSchema
defines the structure of the observation data in the data file(s) referenced in
* the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following
* format to define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": * "F3", "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", * "fieldType": "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", * "fieldType": "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } */ public String getDataSchema() { return this.dataSchema; } /** *
* A JSON string that represents the schema for an Amazon RDS DataSource
. The DataSchema
* defines the structure of the observation data in the data file(s) referenced in the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following format to
* define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": "F3", * "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", "fieldType": * "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", "fieldType": * "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } *
* * @param dataSchema * A JSON string that represents the schema for an Amazon RDSDataSource
. The
* DataSchema
defines the structure of the observation data in the data file(s) referenced in
* the DataSource
.
*
* A DataSchema
is not required if you specify a DataSchemaUri
*
* Define your DataSchema
as a series of key-value pairs. attributes
and
* excludedVariableNames
have an array of key-value pairs for their value. Use the following
* format to define your DataSchema
.
*
* { "version": "1.0", *
** "recordAnnotationFieldName": "F1", *
** "recordWeightFieldName": "F2", *
** "targetFieldName": "F3", *
** "dataFormat": "CSV", *
** "dataFileContainsHeader": true, *
** "attributes": [ *
** { "fieldName": "F1", "fieldType": "TEXT" }, { "fieldName": "F2", "fieldType": "NUMERIC" }, { "fieldName": * "F3", "fieldType": "CATEGORICAL" }, { "fieldName": "F4", "fieldType": "NUMERIC" }, { "fieldName": "F5", * "fieldType": "CATEGORICAL" }, { "fieldName": "F6", "fieldType": "TEXT" }, { "fieldName": "F7", * "fieldType": "WEIGHTED_INT_SEQUENCE" }, { "fieldName": "F8", "fieldType": "WEIGHTED_STRING_SEQUENCE" } ], *
** "excludedVariableNames": [ "F6" ] } * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withDataSchema(String dataSchema) { setDataSchema(dataSchema); return this; } /** *
* The Amazon S3 location of the DataSchema
.
*
DataSchema
.
*/
public void setDataSchemaUri(String dataSchemaUri) {
this.dataSchemaUri = dataSchemaUri;
}
/**
*
* The Amazon S3 location of the DataSchema
.
*
DataSchema
.
*/
public String getDataSchemaUri() {
return this.dataSchemaUri;
}
/**
*
* The Amazon S3 location of the DataSchema
.
*
DataSchema
.
* @return Returns a reference to this object so that method calls can be chained together.
*/
public RDSDataSpec withDataSchemaUri(String dataSchemaUri) {
setDataSchemaUri(dataSchemaUri);
return this;
}
/**
* * The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) instance to * carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see Role templates for * data pipelines. *
* * @param resourceRole * The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) * instance to carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see * Role * templates for data pipelines. */ public void setResourceRole(String resourceRole) { this.resourceRole = resourceRole; } /** ** The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) instance to * carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see Role templates for * data pipelines. *
* * @return The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) * instance to carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see * Role * templates for data pipelines. */ public String getResourceRole() { return this.resourceRole; } /** ** The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) instance to * carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see Role templates for * data pipelines. *
* * @param resourceRole * The role (DataPipelineDefaultResourceRole) assumed by an Amazon Elastic Compute Cloud (Amazon EC2) * instance to carry out the copy operation from Amazon RDS to an Amazon S3 task. For more information, see * Role * templates for data pipelines. * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withResourceRole(String resourceRole) { setResourceRole(resourceRole); return this; } /** ** The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the copy task * from Amazon RDS to Amazon S3. For more information, see Role templates for * data pipelines. *
* * @param serviceRole * The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the * copy task from Amazon RDS to Amazon S3. For more information, see Role templates * for data pipelines. */ public void setServiceRole(String serviceRole) { this.serviceRole = serviceRole; } /** ** The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the copy task * from Amazon RDS to Amazon S3. For more information, see Role templates for * data pipelines. *
* * @return The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the * copy task from Amazon RDS to Amazon S3. For more information, see Role * templates for data pipelines. */ public String getServiceRole() { return this.serviceRole; } /** ** The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the copy task * from Amazon RDS to Amazon S3. For more information, see Role templates for * data pipelines. *
* * @param serviceRole * The role (DataPipelineDefaultRole) assumed by AWS Data Pipeline service to monitor the progress of the * copy task from Amazon RDS to Amazon S3. For more information, see Role templates * for data pipelines. * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withServiceRole(String serviceRole) { setServiceRole(serviceRole); return this; } /** ** The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to carry * out the copy task from Amazon RDS to Amazon S3. *
* * @param subnetId * The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to * carry out the copy task from Amazon RDS to Amazon S3. */ public void setSubnetId(String subnetId) { this.subnetId = subnetId; } /** ** The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to carry * out the copy task from Amazon RDS to Amazon S3. *
* * @return The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline * to carry out the copy task from Amazon RDS to Amazon S3. */ public String getSubnetId() { return this.subnetId; } /** ** The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to carry * out the copy task from Amazon RDS to Amazon S3. *
* * @param subnetId * The subnet ID to be used to access a VPC-based RDS DB instance. This attribute is used by Data Pipeline to * carry out the copy task from Amazon RDS to Amazon S3. * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withSubnetId(String subnetId) { setSubnetId(subnetId); return this; } /** ** The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to carry out * the copy operation from Amazon RDS to an Amazon S3 task. *
* * @return The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are * appropriate ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data * Pipeline to carry out the copy operation from Amazon RDS to an Amazon S3 task. */ public java.util.List* The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to carry out * the copy operation from Amazon RDS to an Amazon S3 task. *
* * @param securityGroupIds * The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to * carry out the copy operation from Amazon RDS to an Amazon S3 task. */ public void setSecurityGroupIds(java.util.Collection* The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to carry out * the copy operation from Amazon RDS to an Amazon S3 task. *
** NOTE: This method appends the values to the existing list (if any). Use * {@link #setSecurityGroupIds(java.util.Collection)} or {@link #withSecurityGroupIds(java.util.Collection)} if you * want to override the existing values. *
* * @param securityGroupIds * The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to * carry out the copy operation from Amazon RDS to an Amazon S3 task. * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withSecurityGroupIds(String... securityGroupIds) { if (this.securityGroupIds == null) { setSecurityGroupIds(new com.amazonaws.internal.SdkInternalList* The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to carry out * the copy operation from Amazon RDS to an Amazon S3 task. *
* * @param securityGroupIds * The security group IDs to be used to access a VPC-based RDS DB instance. Ensure that there are appropriate * ingress rules set up to allow access to the RDS DB instance. This attribute is used by Data Pipeline to * carry out the copy operation from Amazon RDS to an Amazon S3 task. * @return Returns a reference to this object so that method calls can be chained together. */ public RDSDataSpec withSecurityGroupIds(java.util.Collection