在本地使用 Textract 進行 OCR

Question

我想使用 Python 從圖像中提取文本。 （Tessaract lib 對我不起作用，因為它需要安裝）。

我找到了 boto3 lib 和 Textract，但是我在使用它時遇到了麻煩。 我還是新手。 你能告訴我我需要做什么才能正確運行我的腳本嗎？

這是我的代碼：

import cv2
import boto3
import textract


#img = cv2.imread('slika2.jpg') #this is jpg file
with open('slika2.pdf', 'rb') as document:
    img = bytearray(document.read())

textract = boto3.client('textract',region_name='us-west-2')

response = textract.detect_document_text(Document={'Bytes': img}). #gives me error
print(response)

當我運行此代碼時，我得到：

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

我也試過這個：

# Document
documentName = "slika2.jpg"

# Read document content
with open(documentName, 'rb') as document:
    imageBytes = bytearray(document.read())

# Amazon Textract client
textract = boto3.client('textract',region_name='us-west-2')

# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes}) #ERROR

#print(response)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

但我收到此錯誤：

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

我是菜鳥，所以任何幫助都會很好。 如何從我的圖像或 pdf 文件中讀取文本？

我也添加了這段代碼，但錯誤仍然是Unable to locate credentials 。

session = boto3.Session(
    aws_access_key_id='xxxxxxxxxxxx',
    aws_secret_access_key='yyyyyyyyyyyyyyyyyyyyy'
)

Answer 1

將憑據傳遞給 boto3 時出現問題。 您必須在創建 boto3 客戶端時傳遞憑據。

import boto3

# boto3 client
client = boto3.client(
    'textract', 
    region_name='us-west-2', 
    aws_access_key_id='xxxxxxx', 
    aws_secret_access_key='xxxxxxx'
)

# Read image
with open('slika2.png', 'rb') as document:
    img = bytearray(document.read())

# Call Amazon Textract
response = client.detect_document_text(
    Document={'Bytes': img}
)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

請注意，不建議在代碼中硬編碼 AWS 密鑰。 請參考以下文檔

https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/configuration.html

在本地使用 Textract 進行 OCR

問題描述

1 個解決方案

解決方案1
2 已采納 2020-10-08 03:22:59

在本地使用 Textract 進行 OCR

問題描述

1 個解決方案

解決方案1 2 已采納 2020-10-08 03:22:59

解決方案1
2 已采納 2020-10-08 03:22:59