/**
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
* SPDX-License-Identifier: Apache-2.0.
*/
#pragma once
#include Describes the S3 data source. Your input bucket must be in the same
* Amazon Web Services region as your training job.See Also:
AWS
* API Reference
If you choose S3Prefix
, S3Uri
identifies a key name
* prefix. SageMaker uses all objects that match the specified key name prefix for
* model training.
If you choose ManifestFile
,
* S3Uri
identifies an object that is a manifest file containing a
* list of object keys that you want SageMaker to use for model training.
If you choose AugmentedManifestFile
, S3Uri identifies an object
* that is an augmented manifest file in JSON lines format. This file contains the
* data you want to use for model training. AugmentedManifestFile
can
* only be used if the Channel's input mode is Pipe
.
If you choose S3Prefix
, S3Uri
identifies a key name
* prefix. SageMaker uses all objects that match the specified key name prefix for
* model training.
If you choose ManifestFile
,
* S3Uri
identifies an object that is a manifest file containing a
* list of object keys that you want SageMaker to use for model training.
If you choose AugmentedManifestFile
, S3Uri identifies an object
* that is an augmented manifest file in JSON lines format. This file contains the
* data you want to use for model training. AugmentedManifestFile
can
* only be used if the Channel's input mode is Pipe
.
If you choose S3Prefix
, S3Uri
identifies a key name
* prefix. SageMaker uses all objects that match the specified key name prefix for
* model training.
If you choose ManifestFile
,
* S3Uri
identifies an object that is a manifest file containing a
* list of object keys that you want SageMaker to use for model training.
If you choose AugmentedManifestFile
, S3Uri identifies an object
* that is an augmented manifest file in JSON lines format. This file contains the
* data you want to use for model training. AugmentedManifestFile
can
* only be used if the Channel's input mode is Pipe
.
If you choose S3Prefix
, S3Uri
identifies a key name
* prefix. SageMaker uses all objects that match the specified key name prefix for
* model training.
If you choose ManifestFile
,
* S3Uri
identifies an object that is a manifest file containing a
* list of object keys that you want SageMaker to use for model training.
If you choose AugmentedManifestFile
, S3Uri identifies an object
* that is an augmented manifest file in JSON lines format. This file contains the
* data you want to use for model training. AugmentedManifestFile
can
* only be used if the Channel's input mode is Pipe
.
If you choose S3Prefix
, S3Uri
identifies a key name
* prefix. SageMaker uses all objects that match the specified key name prefix for
* model training.
If you choose ManifestFile
,
* S3Uri
identifies an object that is a manifest file containing a
* list of object keys that you want SageMaker to use for model training.
If you choose AugmentedManifestFile
, S3Uri identifies an object
* that is an augmented manifest file in JSON lines format. This file contains the
* data you want to use for model training. AugmentedManifestFile
can
* only be used if the Channel's input mode is Pipe
.
If you choose S3Prefix
, S3Uri
identifies a key name
* prefix. SageMaker uses all objects that match the specified key name prefix for
* model training.
If you choose ManifestFile
,
* S3Uri
identifies an object that is a manifest file containing a
* list of object keys that you want SageMaker to use for model training.
If you choose AugmentedManifestFile
, S3Uri identifies an object
* that is an augmented manifest file in JSON lines format. This file contains the
* data you want to use for model training. AugmentedManifestFile
can
* only be used if the Channel's input mode is Pipe
.
Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline const Aws::String& GetS3Uri() const{ return m_s3Uri; } /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline bool S3UriHasBeenSet() const { return m_s3UriHasBeenSet; } /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline void SetS3Uri(const Aws::String& value) { m_s3UriHasBeenSet = true; m_s3Uri = value; } /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline void SetS3Uri(Aws::String&& value) { m_s3UriHasBeenSet = true; m_s3Uri = std::move(value); } /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline void SetS3Uri(const char* value) { m_s3UriHasBeenSet = true; m_s3Uri.assign(value); } /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline S3DataSource& WithS3Uri(const Aws::String& value) { SetS3Uri(value); return *this;} /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline S3DataSource& WithS3Uri(Aws::String&& value) { SetS3Uri(std::move(value)); return *this;} /** *Depending on the value specified for the S3DataType
, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri
. Note
* that the prefix must be a valid non-empty S3Uri
that precludes
* users from specifying a manifest whose individual S3Uri
is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri
list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri
in this manifest is the input data for
* the channel for this data source. The object that each S3Uri
points
* to must be readable by the IAM role that SageMaker uses to perform tasks on your
* behalf.
Your input bucket must be located in same Amazon Web * Services region as your training job.
*/ inline S3DataSource& WithS3Uri(const char* value) { SetS3Uri(value); return *this;} /** *If you want SageMaker to replicate the entire dataset on each ML compute
* instance that is launched for model training, specify
* FullyReplicated
.
If you want SageMaker to replicate a
* subset of data on each ML compute instance that is launched for model training,
* specify ShardedByS3Key
. If there are n ML compute instances
* launched for a training job, each instance gets approximately 1/n of the
* number of S3 objects. In this case, model training on each machine uses only the
* subset of training data.
Don't choose more ML compute instances for * training than available S3 objects. If you do, some nodes won't get any data and * you will pay for nodes that aren't getting any training data. This applies in * both File and Pipe modes. Keep this in mind when developing algorithms.
*In distributed training, where you use multiple ML compute EC2 instances, you
* might choose ShardedByS3Key
. If the algorithm requires copying
* training data to the ML storage volume (when TrainingInputMode
is
* set to File
), this copies 1/n of the number of objects.
If you want SageMaker to replicate the entire dataset on each ML compute
* instance that is launched for model training, specify
* FullyReplicated
.
If you want SageMaker to replicate a
* subset of data on each ML compute instance that is launched for model training,
* specify ShardedByS3Key
. If there are n ML compute instances
* launched for a training job, each instance gets approximately 1/n of the
* number of S3 objects. In this case, model training on each machine uses only the
* subset of training data.
Don't choose more ML compute instances for * training than available S3 objects. If you do, some nodes won't get any data and * you will pay for nodes that aren't getting any training data. This applies in * both File and Pipe modes. Keep this in mind when developing algorithms.
*In distributed training, where you use multiple ML compute EC2 instances, you
* might choose ShardedByS3Key
. If the algorithm requires copying
* training data to the ML storage volume (when TrainingInputMode
is
* set to File
), this copies 1/n of the number of objects.
If you want SageMaker to replicate the entire dataset on each ML compute
* instance that is launched for model training, specify
* FullyReplicated
.
If you want SageMaker to replicate a
* subset of data on each ML compute instance that is launched for model training,
* specify ShardedByS3Key
. If there are n ML compute instances
* launched for a training job, each instance gets approximately 1/n of the
* number of S3 objects. In this case, model training on each machine uses only the
* subset of training data.
Don't choose more ML compute instances for * training than available S3 objects. If you do, some nodes won't get any data and * you will pay for nodes that aren't getting any training data. This applies in * both File and Pipe modes. Keep this in mind when developing algorithms.
*In distributed training, where you use multiple ML compute EC2 instances, you
* might choose ShardedByS3Key
. If the algorithm requires copying
* training data to the ML storage volume (when TrainingInputMode
is
* set to File
), this copies 1/n of the number of objects.
If you want SageMaker to replicate the entire dataset on each ML compute
* instance that is launched for model training, specify
* FullyReplicated
.
If you want SageMaker to replicate a
* subset of data on each ML compute instance that is launched for model training,
* specify ShardedByS3Key
. If there are n ML compute instances
* launched for a training job, each instance gets approximately 1/n of the
* number of S3 objects. In this case, model training on each machine uses only the
* subset of training data.
Don't choose more ML compute instances for * training than available S3 objects. If you do, some nodes won't get any data and * you will pay for nodes that aren't getting any training data. This applies in * both File and Pipe modes. Keep this in mind when developing algorithms.
*In distributed training, where you use multiple ML compute EC2 instances, you
* might choose ShardedByS3Key
. If the algorithm requires copying
* training data to the ML storage volume (when TrainingInputMode
is
* set to File
), this copies 1/n of the number of objects.
If you want SageMaker to replicate the entire dataset on each ML compute
* instance that is launched for model training, specify
* FullyReplicated
.
If you want SageMaker to replicate a
* subset of data on each ML compute instance that is launched for model training,
* specify ShardedByS3Key
. If there are n ML compute instances
* launched for a training job, each instance gets approximately 1/n of the
* number of S3 objects. In this case, model training on each machine uses only the
* subset of training data.
Don't choose more ML compute instances for * training than available S3 objects. If you do, some nodes won't get any data and * you will pay for nodes that aren't getting any training data. This applies in * both File and Pipe modes. Keep this in mind when developing algorithms.
*In distributed training, where you use multiple ML compute EC2 instances, you
* might choose ShardedByS3Key
. If the algorithm requires copying
* training data to the ML storage volume (when TrainingInputMode
is
* set to File
), this copies 1/n of the number of objects.
If you want SageMaker to replicate the entire dataset on each ML compute
* instance that is launched for model training, specify
* FullyReplicated
.
If you want SageMaker to replicate a
* subset of data on each ML compute instance that is launched for model training,
* specify ShardedByS3Key
. If there are n ML compute instances
* launched for a training job, each instance gets approximately 1/n of the
* number of S3 objects. In this case, model training on each machine uses only the
* subset of training data.
Don't choose more ML compute instances for * training than available S3 objects. If you do, some nodes won't get any data and * you will pay for nodes that aren't getting any training data. This applies in * both File and Pipe modes. Keep this in mind when developing algorithms.
*In distributed training, where you use multiple ML compute EC2 instances, you
* might choose ShardedByS3Key
. If the algorithm requires copying
* training data to the ML storage volume (when TrainingInputMode
is
* set to File
), this copies 1/n of the number of objects.
A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline const Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline bool AttributeNamesHasBeenSet() const { return m_attributeNamesHasBeenSet; } /** *A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline void SetAttributeNames(const Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline void SetAttributeNames(Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& WithAttributeNames(const Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& WithAttributeNames(Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& AddAttributeNames(const Aws::String& value) { m_attributeNamesHasBeenSet = true; m_attributeNames.push_back(value); return *this; } /** *A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& AddAttributeNames(Aws::String&& value) { m_attributeNamesHasBeenSet = true; m_attributeNames.push_back(std::move(value)); return *this; } /** *A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& AddAttributeNames(const char* value) { m_attributeNamesHasBeenSet = true; m_attributeNames.push_back(value); return *this; } /** *A list of names of instance groups that get data from the S3 data source.
*/ inline const Aws::VectorA list of names of instance groups that get data from the S3 data source.
*/ inline bool InstanceGroupNamesHasBeenSet() const { return m_instanceGroupNamesHasBeenSet; } /** *A list of names of instance groups that get data from the S3 data source.
*/ inline void SetInstanceGroupNames(const Aws::VectorA list of names of instance groups that get data from the S3 data source.
*/ inline void SetInstanceGroupNames(Aws::VectorA list of names of instance groups that get data from the S3 data source.
*/ inline S3DataSource& WithInstanceGroupNames(const Aws::VectorA list of names of instance groups that get data from the S3 data source.
*/ inline S3DataSource& WithInstanceGroupNames(Aws::VectorA list of names of instance groups that get data from the S3 data source.
*/ inline S3DataSource& AddInstanceGroupNames(const Aws::String& value) { m_instanceGroupNamesHasBeenSet = true; m_instanceGroupNames.push_back(value); return *this; } /** *A list of names of instance groups that get data from the S3 data source.
*/ inline S3DataSource& AddInstanceGroupNames(Aws::String&& value) { m_instanceGroupNamesHasBeenSet = true; m_instanceGroupNames.push_back(std::move(value)); return *this; } /** *A list of names of instance groups that get data from the S3 data source.
*/ inline S3DataSource& AddInstanceGroupNames(const char* value) { m_instanceGroupNamesHasBeenSet = true; m_instanceGroupNames.push_back(value); return *this; } private: S3DataType m_s3DataType; bool m_s3DataTypeHasBeenSet = false; Aws::String m_s3Uri; bool m_s3UriHasBeenSet = false; S3DataDistribution m_s3DataDistributionType; bool m_s3DataDistributionTypeHasBeenSet = false; Aws::Vector