--- title: "Comprehend custom inference" date: 2020-03-11T14:54:49-04:00 draft: false algo: [comprehend] --- Make sure you saw [this link](../../training/comprehend) for training first After training your model, your custom classifier is ready and can be used to categorize unlabeled documents asynchronously. ## Data prep for inference All documents must be in UTF-8-formatted text files and you can only train your custom classification model using the one document per line format, you can submit your documents in that format or as one document per file The format of the input file should be as follows: ### One document per line ```html Text of document 1 \n Text of document 2 \n Text of document 3 \n Text of document 4 \n ``` After preparing the documents file, place that file in the S3 bucket that you're using for input data. ### One document per file Use the URI S3://bucketName/prefix, if the prefix is a single file, Amazon Comprehend uses that file as input. If more than one file begins with the prefix, Amazon Comprehend uses all of them as input. ## Prediction In order to launch a new job, execute the following replacing with your bucket locations and classifier arns ```html aws comprehend start-document-classification-job --document-classifier-arn <> --input-data-config S3Uri=<>,InputFormat=ONE_DOC_PER_LINE --output-data-config S3Uri=<> --data-access-role-arn <> ``` You should see something like this: ```html { "DocumentClassificationJobProperties": { "JobId": "4*********************8aab", "JobStatus": "IN_PROGRESS", "SubmitTime": 1561679679.036, "DocumentClassifierArn": "YourClassifierArn", "InputDataConfig": { "S3Uri": "YourS3Uri", "InputFormat": "ONE_DOC_PER_LINE" }, "OutputDataConfig": { "S3Uri": "S3OutputLocation" }, "DataAccessRoleArn": "YourAccessRole" } } ``` To check the newly launched job: ```html aws comprehend describe-document-classification-job --job-id <> ``` Then you can download the results using OutputDataConfig.S3Uri To implement it using console, see [this link](https://docs.aws.amazon.com/comprehend/latest/dg/how-class-run.html) To create a model-specific endpoint for synchronous inference for a previously trained custom model using Python: ```python response = client.create_endpoint( EndpointName='string', ModelArn='string', DesiredInferenceUnits=123, ClientRequestToken='string', Tags=[ { 'Key': 'string', 'Value': 'string' }, ] ) ```