/* * Copyright 2018-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. * * Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with * the License. A copy of the License is located at * * http://aws.amazon.com/apache2.0 * * or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR * CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions * and limitations under the License. */ package com.amazonaws.services.comprehend.model; import java.io.Serializable; import javax.annotation.Generated; import com.amazonaws.protocol.StructuredPojo; import com.amazonaws.protocol.ProtocolMarshaller; /** *
* Provides configuration parameters to override the default actions for extracting text from PDF documents and image * files. *
** By default, Amazon Comprehend performs the following actions to extract text from files, based on the input file * type: *
** Word files - Amazon Comprehend parser extracts the text. *
** Digital PDF files - Amazon Comprehend parser extracts the text. *
*
* Image files and scanned PDF files - Amazon Comprehend uses the Amazon Textract DetectDocumentText
* API to extract the text.
*
* DocumentReaderConfig
does not apply to plain text files or Word files.
*
* For image files and PDF documents, you can override these default actions using the fields listed below. For more * information, see Setting * text extraction options in the Comprehend Developer Guide. *
* * @see AWS * API Documentation */ @Generated("com.amazonaws:aws-java-sdk-code-generator") public class DocumentReaderConfig implements Serializable, Cloneable, StructuredPojo { /** ** This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files * and image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the AnalyzeDocument
API
* operation.
*
* Determines the text extraction actions for PDF files. Enter one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by DocumentReadAction
* for all PDF files, including digital PDF files.
*
* Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT
as
* the read action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input document.
*
* This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files * and image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the AnalyzeDocument
API
* operation.
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the
* AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files * and image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the AnalyzeDocument
API
* operation.
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the
* AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files * and image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the AnalyzeDocument
API
* operation.
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the
* AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files * and image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the AnalyzeDocument
API
* operation.
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend service uses the
* DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service uses the
* AnalyzeDocument
API operation.
*
* Determines the text extraction actions for PDF files. Enter one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by DocumentReadAction
* for all PDF files, including digital PDF files.
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by
* DocumentReadAction for all PDF files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by DocumentReadAction
* for all PDF files, including digital PDF files.
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by
* DocumentReadAction for all PDF files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by DocumentReadAction
* for all PDF files, including digital PDF files.
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by
* DocumentReadAction for all PDF files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by DocumentReadAction
* for all PDF files, including digital PDF files.
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the Textract API specified by
* DocumentReadAction for all PDF files, including digital PDF files.
*
* Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT
as
* the read action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input document.
*
TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must specify one or both of the following
* values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input
* document.
*
* Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT
as
* the read action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input document.
*
TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must specify one or both of the following
* values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input
* document.
*
* Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT
as
* the read action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input document.
*
* NOTE: This method appends the values to the existing list (if any). Use * {@link #setFeatureTypes(java.util.Collection)} or {@link #withFeatureTypes(java.util.Collection)} if you want to * override the existing values. *
* * @param featureTypes * Specifies the type of Amazon Textract features to apply. If you chose *TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must specify one or both of the following
* values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input
* document.
*
* Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT
as
* the read action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input document.
*
TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must specify one or both of the following
* values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input
* document.
*
* Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT
as
* the read action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input document.
*
TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must specify one or both of the following
* values:
*
* TABLES
- Returns information about any tables that are detected in the input document.
*
* FORMS
- Returns information and the data from any forms that are detected in the input
* document.
*