使用 Amazon Textract 時不支持的文檔格式，

Question

當我嘗試解析通過 amazon s3 訪問的 pdf 文件時，它給了我一個錯誤，Request has unsupported document format。

我正在使用帶有 boto3 的 Amazon textract。 當我嘗試解析通過 amazon s3 訪問的 pdf 文件時，它給了我一個錯誤，Request has unsupported do cument format。 我對此很陌生，在 textract 的文檔中提到確實支持 pdf 文件。

這是我正在使用的代碼。

import boto3
textractClient = boto3.client('textract',region_name='us-east-1')
response = textractClient.detect_document_text(
        Document={'S3Object': {'Bucket': 'bucketName', 'Name': 'filename.pdf'}})
blocks = response['Blocks']

這給了我錯誤，Request has unsupported document format。

Answer 1

detect_document_text() 是一個同步 API，只支持 PNG 或 JPG 圖片。

如果您想處理 PDF 文件，您應該使用名為 start_document_text_detection() 的異步 API。

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/textract.html#Textract.Client.start_document_text_detection

使用 Amazon Textract 時不支持的文檔格式，

問題描述

1 個解決方案

解決方案1
17 2019-07-19 00:02:13

使用 Amazon Textract 時不支持的文檔格式，

問題描述

1 個解決方案

解決方案1 17 2019-07-19 00:02:13

解決方案1
17 2019-07-19 00:02:13