改进 tesseract 的图像预处理（视频游戏截图）

Question

I am trying to read text for prices in a video game and am experiencing difficulty in pre-processing the image.我正在尝试阅读视频游戏中的价格文本，并且在预处理图像时遇到了困难。

The rest of my code is "complete", as in after the text is extracted I am formatting it and outputting into CSV for later use.我的代码的 rest 是“完整的”，因为在提取文本后，我正在对其进行格式化并输出到 CSV 以供以后使用。

This is what I have come up with so far for the following images, and would like input on other thresholds or pre-processing tools that will make the OCR more accurate.到目前为止，这是我为以下图像提出的建议，并希望输入其他阈值或预处理工具，以使 OCR 更准确。

Raw Image Screenshot原始图像截图

After gamma, denoise on left - binary threshold on right伽玛之后，左边的去噪 - 右边的二进制阈值

The text detected检测到的文字

As you can see, it is very close but not perfect.如您所见，它非常接近但并不完美。 I would like to make it more accurate as I will be processing many frames eventually.我想让它更准确，因为我最终会处理很多帧。

Here is my current code:这是我当前的代码：

import cv2
import pytesseract
import pandas as pd
import numpy as np

# Tells pytesseract where the tesseract environment is installed on local computer
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

img = cv2.imread("./image_frames/frame0.png")

# gamma to darken text to be same opacity?
def adjust_gamma(crop_img, gamma=1.0):
    # build a lookup table mapping the pixel values [0, 255] to
    # their adjusted gamma values
    invGamma = 1.0 / gamma
    table = np.array([((i / 255.0) ** invGamma) * 255
        for i in np.arange(0, 256)]).astype("uint8")
    # apply gamma correction using the lookup table
    return cv2.LUT(crop_img, table)

adjusted = adjust_gamma(crop_img, gamma=0.15)

# grayscale the image
gray = cv2.cvtColor(adjusted, cv2.COLOR_BGR2GRAY)
# denoising image
dst = cv2.fastNlMeansDenoising(gray, None, 10, 10, 10)


# binary threshold
thresh = cv2.threshold(gray, 35, 255, cv2.THRESH_BINARY_INV)[1]


# OCR configurations (3 is default)
config = "--psm 3"

# Just show the image
cv2.imshow("before", gray)
cv2.imshow("before", dst)
cv2.imshow("thresh", thresh)
cv2.waitKey(0)

# Reads text from the image and prints to console
text = pytesseract.image_to_string(thresh, config=config)
# remove double lines
text = text.replace('\n\n','\n')
# remove unicode character
text = text.replace('', '')
print(text)

Any help is appreciated as I am very new to this!感谢任何帮助，因为我对此很陌生！

Answer 1

Step#1: Scale the image步骤#1：缩放图像

Step#2: Apply adaptive-threshold步骤#2：应用adaptive-threshold

Step#3: Set page-segmentation-mode ( psm ) to 6 (Assume a single uniform block of text.)步骤#3：将 page-segmentation-mode ( psm ) 设置为 6（假设一个统一的文本块。）

1 Scaling the image: 1缩放图像：

The reason is to see the image clearly, since the original image is really small.原因是为了看清楚图像，因为原始图像非常小。

 img = cv2.imread("udQw1.png") img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)

2 Apply adaptive-threshold 2应用adaptive-threshold

Generally threshold is applied, but in your image, applying threshold has no effect to the result.通常应用threshold ，但在您的图像中，应用threshold对结果没有影响。
For different images you may need to set different C and block values.对于不同的图像，您可能需要设置不同C和block值。
For instance for the 1st image:例如第一张图片：

 gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 15, 22)

Result:结果：
For instance for the 2nd image:例如第二张图片：

 gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 51, 4)

Result:结果：

3 Set psm to 6 which assumes the image as a single uniform block of text. 3将psm设置为 6，它将图像假定为单个统一的文本块。

 txt = pytesseract.image_to_string(thr, config="--psm 6") print(txt)

Result for the 1st image:第一张图片的结果：

 Dragon Claymore 1,388,888,888 mesos. Maple Pyrope Spear 288,888,888 mesos. Element Pierce 488,888,888 mesos. Purple Adventurer Cape 97,777,777 mesos.

Result for the 2nd image:第二张图片的结果：

 Ring of Alchemist 749,999,995 mesos. Dragon Slash Claw 499,999,995 mesos. "Stormcaster Gloves 149,999,995 mesos. Elemental Wand 6 749,999,995 mesos. Big Money Chalr 1 tor 249,999,985 mesos.|

Code for the 1st image:第一张图片的代码：

import pytesseract
import cv2

img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 15, 22)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

Code for the 2nd image:第二张图片的代码：

import pytesseract
import cv2

img = cv2.imread("7Y2yx.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 51, 4)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

Links链接

改进 tesseract 的图像预处理（视频游戏截图）

问题描述

1 个解决方案

解决方案1
2 2021-01-09 00:02:16

改进 tesseract 的图像预处理（视频游戏截图）

问题描述

1 个解决方案

解决方案1 2 2021-01-09 00:02:16

解决方案1
2 2021-01-09 00:02:16