When i try to parse pdf file accessed via amazon s3, it gives me an error, Request has unsupported document format.
i am using Amazon textract with boto3. When i try to parse pdf file accessed via amazon s3, it gives me an error, Request has unsupported do cument format. I am fairly new to this, in the documentation of textract it is mentioned that pdf files are indeed supported.
This is the code i am using.
import boto3
textractClient = boto3.client('textract',region_name='us-east-1')
response = textractClient.detect_document_text(
Document={'S3Object': {'Bucket': 'bucketName', 'Name': 'filename.pdf'}})
blocks = response['Blocks']
This gives me the error,Request has unsupported document format.
detect_document_text() is a synchronous API that only support PNG or JPG images.
If you'd like to process PDF files, you should use the asynchronous API called start_document_text_detection().
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.