/* * Copyright 2010-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. * * Licensed under the Apache License, Version 2.0 (the "License"). * You may not use this file except in compliance with the License. * A copy of the License is located at * * http://aws.amazon.com/apache2.0 * * or in the "license" file accompanying this file. This file is distributed * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either * express or implied. See the License for the specific language governing * permissions and limitations under the License. */ package com.amazonaws.services.comprehend.model; import java.io.Serializable; /** *
* Provides configuration parameters to override the default actions for * extracting text from PDF documents and image files. *
** By default, Amazon Comprehend performs the following actions to extract text * from files, based on the input file type: *
** Word files - Amazon Comprehend parser extracts the text. *
** Digital PDF files - Amazon Comprehend parser extracts the text. *
*
* Image files and scanned PDF files - Amazon Comprehend uses the Amazon
* Textract DetectDocumentText
API to extract the text.
*
* DocumentReaderConfig
does not apply to plain text files or Word
* files.
*
* For image files and PDF documents, you can override these default actions * using the fields listed below. For more information, see Setting text extraction options in the Comprehend Developer Guide. *
*/ public class DocumentReaderConfig implements Serializable { /** ** This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. Enter one * of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend
* service uses the DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service
* uses the AnalyzeDocument
API operation.
*
* Constraints:
* Allowed Values: TEXTRACT_DETECT_DOCUMENT_TEXT,
* TEXTRACT_ANALYZE_DOCUMENT
*/
private String documentReadAction;
/**
*
* Determines the text extraction actions for PDF files. Enter one of the * following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults
* for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the
* Textract API specified by DocumentReadAction for all PDF files, including
* digital PDF files.
*
* Constraints:
* Allowed Values: SERVICE_DEFAULT, FORCE_DOCUMENT_READ_ACTION
*/
private String documentReadMode;
/**
*
* Specifies the type of Amazon Textract features to apply. If you chose
* TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must
* specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are
* detected in the input document.
*
* FORMS
- Returns information and the data from any forms that
* are detected in the input document.
*
* This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. Enter one * of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend
* service uses the DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service
* uses the AnalyzeDocument
API operation.
*
* Constraints:
* Allowed Values: TEXTRACT_DETECT_DOCUMENT_TEXT,
* TEXTRACT_ANALYZE_DOCUMENT
*
* @return
* This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. * Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon
* Comprehend service uses the DetectDocumentText
API
* operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend
* service uses the AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. Enter one * of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend
* service uses the DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service
* uses the AnalyzeDocument
API operation.
*
* Constraints:
* Allowed Values: TEXTRACT_DETECT_DOCUMENT_TEXT,
* TEXTRACT_ANALYZE_DOCUMENT
*
* @param documentReadAction
* This field defines the Amazon Textract API operation that * Amazon Comprehend uses to extract text from PDF files and * image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon
* Comprehend service uses the DetectDocumentText
* API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend
* service uses the AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. Enter one * of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend
* service uses the DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service
* uses the AnalyzeDocument
API operation.
*
* Returns a reference to this object so that method calls can be chained * together. *
* Constraints:
* Allowed Values: TEXTRACT_DETECT_DOCUMENT_TEXT,
* TEXTRACT_ANALYZE_DOCUMENT
*
* @param documentReadAction
* This field defines the Amazon Textract API operation that * Amazon Comprehend uses to extract text from PDF files and * image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon
* Comprehend service uses the DetectDocumentText
* API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend
* service uses the AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. Enter one * of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend
* service uses the DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service
* uses the AnalyzeDocument
API operation.
*
* Constraints:
* Allowed Values: TEXTRACT_DETECT_DOCUMENT_TEXT,
* TEXTRACT_ANALYZE_DOCUMENT
*
* @param documentReadAction
* This field defines the Amazon Textract API operation that * Amazon Comprehend uses to extract text from PDF files and * image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon
* Comprehend service uses the DetectDocumentText
* API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend
* service uses the AnalyzeDocument
API operation.
*
* This field defines the Amazon Textract API operation that Amazon * Comprehend uses to extract text from PDF files and image files. Enter one * of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon Comprehend
* service uses the DetectDocumentText
API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend service
* uses the AnalyzeDocument
API operation.
*
* Returns a reference to this object so that method calls can be chained * together. *
* Constraints:
* Allowed Values: TEXTRACT_DETECT_DOCUMENT_TEXT,
* TEXTRACT_ANALYZE_DOCUMENT
*
* @param documentReadAction
* This field defines the Amazon Textract API operation that * Amazon Comprehend uses to extract text from PDF files and * image files. Enter one of the following values: *
*
* TEXTRACT_DETECT_DOCUMENT_TEXT
- The Amazon
* Comprehend service uses the DetectDocumentText
* API operation.
*
* TEXTRACT_ANALYZE_DOCUMENT
- The Amazon Comprehend
* service uses the AnalyzeDocument
API operation.
*
* Determines the text extraction actions for PDF files. Enter one of the * following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults
* for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the
* Textract API specified by DocumentReadAction for all PDF files, including
* digital PDF files.
*
* Constraints:
* Allowed Values: SERVICE_DEFAULT, FORCE_DOCUMENT_READ_ACTION
*
* @return
* Determines the text extraction actions for PDF files. Enter one * of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service
* defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses
* the Textract API specified by DocumentReadAction for all PDF
* files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the * following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults
* for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the
* Textract API specified by DocumentReadAction for all PDF files, including
* digital PDF files.
*
* Constraints:
* Allowed Values: SERVICE_DEFAULT, FORCE_DOCUMENT_READ_ACTION
*
* @param documentReadMode
* Determines the text extraction actions for PDF files. Enter * one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend
* service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend
* uses the Textract API specified by DocumentReadAction for all
* PDF files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the * following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults
* for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the
* Textract API specified by DocumentReadAction for all PDF files, including
* digital PDF files.
*
* Returns a reference to this object so that method calls can be chained * together. *
* Constraints:
* Allowed Values: SERVICE_DEFAULT, FORCE_DOCUMENT_READ_ACTION
*
* @param documentReadMode
* Determines the text extraction actions for PDF files. Enter * one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend
* service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend
* uses the Textract API specified by DocumentReadAction for all
* PDF files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the * following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults
* for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the
* Textract API specified by DocumentReadAction for all PDF files, including
* digital PDF files.
*
* Constraints:
* Allowed Values: SERVICE_DEFAULT, FORCE_DOCUMENT_READ_ACTION
*
* @param documentReadMode
* Determines the text extraction actions for PDF files. Enter * one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend
* service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend
* uses the Textract API specified by DocumentReadAction for all
* PDF files, including digital PDF files.
*
* Determines the text extraction actions for PDF files. Enter one of the * following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend service defaults
* for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend uses the
* Textract API specified by DocumentReadAction for all PDF files, including
* digital PDF files.
*
* Returns a reference to this object so that method calls can be chained * together. *
* Constraints:
* Allowed Values: SERVICE_DEFAULT, FORCE_DOCUMENT_READ_ACTION
*
* @param documentReadMode
* Determines the text extraction actions for PDF files. Enter * one of the following values: *
*
* SERVICE_DEFAULT
- use the Amazon Comprehend
* service defaults for PDF files.
*
* FORCE_DOCUMENT_READ_ACTION
- Amazon Comprehend
* uses the Textract API specified by DocumentReadAction for all
* PDF files, including digital PDF files.
*
* Specifies the type of Amazon Textract features to apply. If you chose
* TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must
* specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are
* detected in the input document.
*
* FORMS
- Returns information and the data from any forms that
* are detected in the input document.
*
* Specifies the type of Amazon Textract features to apply. If you
* chose TEXTRACT_ANALYZE_DOCUMENT
as the read action,
* you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables that
* are detected in the input document.
*
* FORMS
- Returns information and the data from any
* forms that are detected in the input document.
*
* Specifies the type of Amazon Textract features to apply. If you chose
* TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must
* specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are
* detected in the input document.
*
* FORMS
- Returns information and the data from any forms that
* are detected in the input document.
*
* Specifies the type of Amazon Textract features to apply. If
* you chose TEXTRACT_ANALYZE_DOCUMENT
as the read
* action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables
* that are detected in the input document.
*
* FORMS
- Returns information and the data from any
* forms that are detected in the input document.
*
* Specifies the type of Amazon Textract features to apply. If you chose
* TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must
* specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are
* detected in the input document.
*
* FORMS
- Returns information and the data from any forms that
* are detected in the input document.
*
* Returns a reference to this object so that method calls can be chained * together. * * @param featureTypes
* Specifies the type of Amazon Textract features to apply. If
* you chose TEXTRACT_ANALYZE_DOCUMENT
as the read
* action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables
* that are detected in the input document.
*
* FORMS
- Returns information and the data from any
* forms that are detected in the input document.
*
* Specifies the type of Amazon Textract features to apply. If you chose
* TEXTRACT_ANALYZE_DOCUMENT
as the read action, you must
* specify one or both of the following values:
*
* TABLES
- Returns information about any tables that are
* detected in the input document.
*
* FORMS
- Returns information and the data from any forms that
* are detected in the input document.
*
* Returns a reference to this object so that method calls can be chained * together. * * @param featureTypes
* Specifies the type of Amazon Textract features to apply. If
* you chose TEXTRACT_ANALYZE_DOCUMENT
as the read
* action, you must specify one or both of the following values:
*
* TABLES
- Returns information about any tables
* that are detected in the input document.
*
* FORMS
- Returns information and the data from any
* forms that are detected in the input document.
*