Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: CC-BY-SA-4.0
A training algorithm indicates whether it succeeded or failed using the exit code of its process.
A successful training execution should exit with an exit code of 0 and an unsuccessful training execution should exit with a non-zero exit code. These will be converted to “Completed” and “Failed” in the TrainingJobStatus
returned by DescribeTrainingJob
. This exit code convention is standard and is easily implemented in all languages. For example, in Python, you can use sys.exit(1)
to signal a failure exit and simply running to the end of the main routine will cause Python to exit with code 0.
In the case of failure, the algorithm can write a description of the failure to the failure file. See next section for details.