簡體   English   中英

如何使用 OpenCV Pytesseract 在圖像中從左到右提取單詞?

[英]How to extract words from left to right in an image with OpenCV Pytesseract?

我正在與 OpenCV 和 pytesseract 簽訂合同。 我想從這張圖片中提取單詞

此圖像報告圖像

我正在嘗試使用 getStructureElement,但我的代碼跳轉到圖像中心的下一行。 我正在嘗試從圖像的左側提取單詞,並在從所有左側提取字符串后移動到圖像的右側。

代碼是:

import cv2, import pytesseract, from PIL import Image

image = cv2.imread("report_name-1.jpg")

#preprocessing 

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # grayscale

thresh = cv2.threshold(gray,150,255,cv2.THRESH_BINARY_INV) # threshold

kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))

dilated = cv2.erode(thresh,kernel,iterations = 13) # dilate

contours, hierarchy =cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) # get contours

# get rectangle bounding contour
[x,y,w,h] = cv2.boundingRect(contour)
# discard areas that are too large
if h>300 and w>300:
    continue

# discard areas that are too small
if h<40 or w<40:
    continue

# draw rectangle around contour on original image
cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,255),2)

您可以使用--psm 6從左到右和從上到下提取文本,它告訴 Pytesseract 假設一個統一的文本塊。 預處理也很重要,因此我們閾值以獲得所需的黑色前景文本和白色背景的二進制圖像。 在此處查找其他 Pytesseract 配置選項。 閾值化后,這是我們放入 Pytesseract 的圖像

在此處輸入圖像描述

這是 output

Limit Balance
Sep 29, 2015 $17,750.0 Oct 01, 2018 $0.00 Oct 02, 2018
0
Account Condition: Paid account/zero Account #: Delinquency 30 Days = $0.00 | 60 Days =$0.00 90+ Days =$0.00 | Derog =00
balance 4636676005495602 Counter (Past
seven years)
Payment Status: This is an account in good Responsibility: Individual
standing
Account Type: Credit Card Account Term: REV
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 0 0 0
2017 0 0 0 0 0 0 0 0 0 0 0 0
2018 0 0 0 0 0 0 0 0 0 B
> BMW FINANCIAL SERVICES /
2602980
Open Date Original Amount Credit Status Date Chargeoff Amount Past Due Last Paid Date Balance Date Current
Limit Balance
Sep 19, 2015 $27,189.00 Jul01, 2017 $0.00 Jul 21, 2017 Jul 24, 2017
Account Condition: Paid account/zero Account #: 4002206279 Delinquency 30 Days = $0.00 | 60 Days =$0.00 90+ Days =$0.00 | Derog =00
balance Counter (Past
seven years)
Payment Status: This is an account in good Responsibility: Individual
standing
Account Type: Auto Lease Account Term: 036
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2015 Cc Cc Cc Cc
2016 Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc
2017 Cc Cc Cc Cc Cc Cc B
> LEXUS FINANCIAL SERVIC /
1624210
Open Date Original Amount Credit Status Date Chargeoff Amount Past Due Last Paid Date Balance Date Current
Limit Balance
Mar 07, 2015 $40,342.00 Jul01, 2016 $0.00 Jul 05, 2016 Jul 31, 2016
Account Condition: Paid account/zero Account #: Delinquency 30 Days = $0.00 | 60 Days =$0.00 90+ Days =$0.00 | Derog =00
balance 70403662535410001 Counter (Past
seven years)
Payment Status: This is an account in good Responsibility: Individual
standing
Account Type: Auto Loan Account Term: 072
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014
2015 Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc
2016 Cc Cc Cc Cc Cc Cc B
> AES/SUNTRUST BANK / 9997195
Open Date Original Amount Credit Status Date Chargeoff Amount Past Due Last Paid Date Balance Date Current
Limit Balance
Sep 19, 2008 $12,500.00 Apr 01, 2016 $0.00 Apr 21, 2016 Apr 30, 2016
Account Condition: Paid account/zero Account #: Delinquency 30 Days = $0.00 | 60 Days =$0.00 90+ Days =$0.00 | Derog =00
balance 5046237209PA00001 Counter (Past
seven years)
Payment Status: This is an account in good Responsibility: Signer
standing
Account Type: Education Loan Account Term: 300
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 Cc Cc Cc Cc Cc Cc Cc Cc Cc
2015 Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc Cc
2016 Cc Cc Cc B
> BARCLAYS BANK DELAWARE /
1223850
Open Date Original Amount Credit Status Date Chargeoff Amount Past Due Last Paid Date Balance Date Current
Limit Balance
Apr 04, 2013 $3,500.00 Apr 01, 2016 $0.00 Oct 06, 2014 Apr 05, 2016
Account Condition: Paid account/zero Account #: 000176863399109 Delinquency 30 Days = $0.00 | 60 Days =$0.00 90+ Days =$0.00 | Derog =00
balance Counter (Past
seven years)
Payment Status: This is an account in good Responsibility: Individual
standing
Account Type: Credit Card Account Term: REV
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 Cc Cc Cc Cc Cc Cc Cc Cc 0
2015 0 0 0 0 0 0 0 0 0 0 0 0
2016 0 0 0 B
> AMERICAN HONDA FINANCE /
1605190
import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')

print(data)

我正在使用 opencv 4.1.1。 抱歉,我現在上傳了盒子圖片。 請檢查一下。 您可以看到這些框在水平軸上彼此分開。 盒子的圖像

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM