Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: CC-BY-SA-4.0
A configuration for a shuffle option for input data in a channel. If you use S3Prefix
for S3DataType
, the results of the S3 key prefix matches are shuffled. If you use ManifestFile
, the order of the S3 object references in the ManifestFile
is shuffled. If you use AugmentedManifestFile
, the order of the JSON lines in the AugmentedManifestFile
is shuffled. The shuffling order is determined using the Seed
value.
For Pipe input mode, shuffling is done at the start of every epoch. With large datasets, this ensures that the order of the training data is different for each epoch, and it helps reduce bias and possible overfitting. In a multi-node training job when ShuffleConfig
is combined with S3DataDistributionType
of ShardedByS3Key
, the data is shuffled across nodes so that the content sent to a particular node on the first epoch might be sent to a different node on the second epoch.
Seed Determines the shuffling order in ShuffleConfig
value.
Type: Long
Required: Yes
For more information about using this API in one of the language-specific AWS SDKs, see the following: + AWS SDK for C++ + AWS SDK for Go + AWS SDK for Go - Pilot + AWS SDK for Java + AWS SDK for Ruby V2