简体   繁体   English

如何将 AWS Textract 与 Python 一起使用

[英]How can I use AWS Textract with Python

I have tested almost every example code I can find on the Internet for Amazon Textract and I cant get it to work.我已经测试了几乎所有在 Internet 上可以找到的 Amazon Textract 示例代码,但我无法让它工作。 I can upload and download a file to S3 from my Python client so the credentials should be OK.我可以从我的 Python 客户端将文件上传和下载到 S3,因此凭据应该没问题。 Lots of the errors points to some region failure but I have try every possible combinations.许多错误指向某些区域故障,但我已经尝试了所有可能的组合。

Here are one of the last test call -这是最后一个测试电话之一 -

def test_parse_3():
# Document
s3BucketName = "xx-xxxx-xx"
documentName = "xxxx.jpg"

# Amazon Textract client
textract = boto3.client('textract')

# Call Amazon Textract
response = textract.detect_document_text(
    Document={
        'S3Object': {
            'Bucket': s3BucketName,
            'Name': documentName
        }
    })

print(response)

seems to be pretty easy but it generates the error -似乎很容易,但它会产生错误 -

botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the DetectDocumentText operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.

Any ideas whats wrong and dose someone have a working example (I knew the tabs are not correct in the example code)?任何想法有什么问题并且有人有一个工作示例(我知道示例代码中的选项卡不正确)?

I have also tested a lot of permission settings in AWS.我还在 AWS 中测试了很多权限设置。 The credentials are in a hidden files created by aws sdk.凭据位于 aws sdk 创建的隐藏文件中。

I am sure you already know, but the bucket is case sensitive.我相信你已经知道了,但是桶是区分大小写的。 If you have verified that both the object bucket and name are correct, just make sure to add the appropriate region to your credentials.如果您已验证 object 存储桶和名称都正确,只需确保将适当的区域添加到您的凭证中。

I tested just reading from s3 without including the region in the credentials and I was able to list the objects in the bucket with no issues.我测试了只是从 s3 读取而不在凭证中包含区域,并且我能够毫无问题地列出存储桶中的对象。 I am thinking this worked because s3 is supposed to be region agnostic.我认为这很有效,因为 s3 应该与区域无关。 However, since Textract is region specific, you must define the region in your credentials when using Textract to get the data from the s3 bucket.但是,由于 Textract 是特定于区域的,因此在使用 Textract 从 s3 存储桶中获取数据时,您必须在凭证中定义区域。

I realize this was asked a few months ago, but I am hoping this sheds some light to others that face this issue in the future.我意识到这是几个月前提出的问题,但我希望这能为将来面临这个问题的其他人提供一些启示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM