簡體   English   中英

在本地使用 Textract 進行 OCR

[英]Using Textract for OCR locally

我想使用 Python 從圖像中提取文本。 (Tessaract lib 對我不起作用,因為它需要安裝)。

我找到了 boto3 lib 和 Textract,但是我在使用它時遇到了麻煩。 我還是新手。 你能告訴我我需要做什么才能正確運行我的腳本嗎?

這是我的代碼:

import cv2
import boto3
import textract


#img = cv2.imread('slika2.jpg') #this is jpg file
with open('slika2.pdf', 'rb') as document:
    img = bytearray(document.read())

textract = boto3.client('textract',region_name='us-west-2')

response = textract.detect_document_text(Document={'Bytes': img}). #gives me error
print(response)

當我運行此代碼時,我得到:

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

我也試過這個:

# Document
documentName = "slika2.jpg"

# Read document content
with open(documentName, 'rb') as document:
    imageBytes = bytearray(document.read())

# Amazon Textract client
textract = boto3.client('textract',region_name='us-west-2')

# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes}) #ERROR

#print(response)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

但我收到此錯誤:

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

我是菜鳥,所以任何幫助都會很好。 如何從我的圖像或 pdf 文件中讀取文本?

我也添加了這段代碼,但錯誤仍然是Unable to locate credentials

session = boto3.Session(
    aws_access_key_id='xxxxxxxxxxxx',
    aws_secret_access_key='yyyyyyyyyyyyyyyyyyyyy'
)

將憑據傳遞給 boto3 時出現問題。 您必須在創建 boto3 客戶端時傳遞憑據。

import boto3

# boto3 client
client = boto3.client(
    'textract', 
    region_name='us-west-2', 
    aws_access_key_id='xxxxxxx', 
    aws_secret_access_key='xxxxxxx'
)

# Read image
with open('slika2.png', 'rb') as document:
    img = bytearray(document.read())

# Call Amazon Textract
response = client.detect_document_text(
    Document={'Bytes': img}
)

# Print detected text
for item in response["Blocks"]:
    if item["BlockType"] == "LINE":
        print ('\033[94m' +  item["Text"] + '\033[0m')

請注意,不建議在代碼中硬編碼 AWS 密鑰。 請參考以下文檔

https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/configuration.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM