如何使用Tesseract從此圖像中獲取文本？

Question

目前我正在使用下面的代碼從圖像中獲取文本並且它工作正常，但它對這兩個圖像效果不佳，似乎tesseract無法掃描這些類型的圖像。 請告訴我如何解決它

https://i.ibb.co/zNkbhKG/Untitled1.jpg

https://i.ibb.co/XVbjc3s/Untitled3.jpg

def read_screen():
        spinner = Halo(text='Reading screen', spinner='bouncingBar')
        spinner.start()
        screenshot_file="Screens/to_ocr.png"
        screen_grab(screenshot_file)

        #prepare argparse
        ap = argparse.ArgumentParser(description='HQ_Bot')
        ap.add_argument("-i", "--image", required=False,default=screenshot_file,help="path to input image to be OCR'd")
        ap.add_argument("-p", "--preprocess", type=str, default="thresh", help="type of preprocessing to be done")
        args = vars(ap.parse_args())

        # load the image 
        image = cv2.imread(args["image"])
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        if args["preprocess"] == "thresh":
                gray = cv2.threshold(gray, 177, 177,
                        cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
        elif args["preprocess"] == "blur":
                gray = cv2.medianBlur(gray, 3)

        # store grayscale image as a temp file to apply OCR
        filename = "Screens/{}.png".format(os.getpid())
        cv2.imwrite(filename, gray)

        # load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file
        pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
        #ENG
        #text = pytesseract.image_to_string(Image.open(filename))

        #VIET
        text = pytesseract.image_to_string(Image.open(filename), lang='vie')

        os.remove(filename)
        os.remove(screenshot_file)

        # show the output images

        '''cv2.imshow("Image", image)
        cv2.imshow("Output", gray)
        os.remove(screenshot_file)
        if cv2.waitKey(0):
                cv2.destroyAllWindows()
        print(text)
        '''
        spinner.succeed()
        spinner.stop()
        return text

Answer 1

您應該嘗試不同的psm模式而不是默認模式，如下所示：

target = pytesseract.image_to_string(im,config='--psm 4',lang='vie')

從文檔中發揮作用：

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

因此，例如對於/Untitled3.jpg您可以嘗試使用--psm 4並且失敗，您可以嘗試使用--psm 11 。

根據您的tesseract版本，您還可以嘗試不同的oem模式：

對於LSTM使用--oem 1，對於Legacy Tesseract使用--oem 0。 請注意，Legacy Tesseract模型僅包含在tessdata repo的訓練有素的數據文件中。

編輯

另外，如圖所示，有兩種語言，所以如果你想使用lang參數，你需要手動將圖像分成兩個，以免混淆tesseract引擎並為它們使用不同的lang值。

編輯2

下面是Unitiled3的完整工作示例。 我注意到你不正確地使用閾值。 您應該將maxval設置為大於您設定閾值的值。 就像在我的例子中我設置thresh 177但maxval設置為255所以177以上的所有內容都是黑色的。 我甚至不需要做任何二值化。

import cv2
import pytesseract
from cv2.cv2 import imread, cvtColor, COLOR_BGR2GRAY, threshold, THRESH_BINARY

image = imread("./Untitled3.jpg")
image = cvtColor(image,COLOR_BGR2GRAY)
_,image = threshold(image,177,255,THRESH_BINARY)
cv2.namedWindow("TEST")
cv2.imshow("TEST",image)
cv2.waitKey()
text = pytesseract.image_to_string(image, lang='eng')
print(text)

輸出：

New York, New York

Salzburg, Austria

Hollywood, California

如何使用Tesseract從此圖像中獲取文本？

問題描述

1 個解決方案

解決方案1
0 已采納 2019-06-18 08:34:40

如何使用Tesseract從此圖像中獲取文本？

問題描述

1 個解決方案

解決方案1 0 已采納 2019-06-18 08:34:40

解決方案1
0 已采納 2019-06-18 08:34:40