需要帮助找到 Pytesseract 的正确配置才能识别此文本

Question

So I'm doing this script where I need to be able to recognize text from this type of pictures [1][2], namely the text "Curse of Binding" and "Looting I".所以我正在做这个脚本，我需要能够从这种类型的图片中识别文本 [1][2]，即文本“绑定诅咒”和“掠夺 I”。 I'm new to Pytesseract and C2V, so I really don't know how I should edit the pictures.我是 Pytesseract 和 C2V 的新手，所以我真的不知道应该如何编辑图片。 The following code doesn't get me the result I want.下面的代码没有得到我想要的结果。

import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = "C:\\Users\\guilh\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"

image = cv2.imread('trades/1.png')

text = pytesseract.image_to_string(image, lang='mc')
print(result)

Thanks in advance!提前致谢！

[1] - https://i.stack.imgur.com/0fNhM.png [1] - https://i.stack.imgur.com/0fNhM.png

[2] - https://i.stack.imgur.com/J71v6.png [2] - https://i.stack.imgur.com/J71v6.png

Edit: the lang "mc" is a custom font for the one in the pictures, from the game Minecraft.编辑：lang“mc”是图片中的自定义字体，来自游戏 Minecraft。

Answer 1

This may be a far fetched question, but did you download the mc trained data from this link ?这可能是一个牵强附会的问题，但您是否从该链接下载了 mc 训练数据？

If so, this training data has problems with certain characters and only really works well for numbers.如果是这样，则此训练数据在某些字符方面存在问题，并且仅适用于数字。 Another important thing to do is try to cut out the background around the text.另一件重要的事情是尝试剪掉文本周围的背景。

I'm doing a similar project here with a few differences.我在这里做一个类似的项目，但有一些不同。 (using tesserocr as its faster for video/large amounts of images) (reading the f3 debug menu which guarantees white text) （使用 tesserocr 更快地处理视频/大量图像）（阅读确保白色文本的 f3 调试菜单）

If you have a look at process_image, it takes the image, cuts out all non-gray pixels, then applies cv2.threshold(im_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)如果您查看 process_image，它会获取图像，切掉所有非灰色像素，然后应用 cv2.threshold(im_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

I'm also using a crop to the first column that actually has text after applying all effects to minimize large amounts of white space at the start.在应用所有效果以最小化开始时的大量空白之后，我还在第一列中使用裁剪来实际包含文本。

Instead of checking for gray pixels, you could try using而不是检查灰色像素，您可以尝试使用

# Check out hsv masking/filtering in opencv documentation
image = cv2.inRange((h_min,s_min,v_min), (h_max, s_max, v_max), image)
ret3, image = cv2.threshold(image,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

def process_image(im, crop_to_activity=False):
    """
    Converts the image to a numpy array, then applies preprocessing.
    """
    im_arr = np.array(im)
        
    height, width, depth = im_arr.shape

    for i in range(height):
        for j in range(width):
            r, g, b = im_arr[i][j]

            r = (r + 150) / 2
            g = (g + 150) / 2
            b = (b + 150) / 2

            mean = (r + g + b) / 3
            diffr = abs(mean - r)
            diffg = abs(mean - g)
            diffb = abs(mean - b)

            maxdev = 2

            if (diffr + diffg + diffb) > maxdev:
                im_arr[i][j][0] = 0
                im_arr[i][j][1] = 0
                im_arr[i][j][2] = 0
            


    im_arr = cv2.cvtColor(im_arr, cv2.COLOR_BGR2GRAY)

        #cap_arr = cv2.threshold(cap_arr,127,255,cv2.THRESH_BINARY)
    
    # Otsu's thresholding after Gaussian filtering
    #blur = cv2.GaussianBlur(cap_arr,(3,3),0)
    ret3, im_arr = cv2.threshold(im_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)


    if crop_to_activity:
        last_column = -1
        for j in range(width):
            for i in range(height):
                v = im_arr[i][j]

                if v != 0:
                    last_column = j
                    break
            if last_column != -1:
                break
        
        last_column = max(0, last_column-3)
        im_arr = im_arr[:, last_column:]

    return im_arr

Answer 2

I would like to suggest inRange Thresholding我想建议inRange Thresholding

If you apply:如果您申请：

1st Image第一张图片	2nd Image第二张图片

If you set the page segmentation result to 6:如果将分页结果设置为 6：

1st Image第一张图片	2nd Image第二张图片
Enchanted Book魔法书	Enchanted Book魔法书
Curse of Binding束缚诅咒

In the 2nd image the looting-I is missing so we need to set different values:在第二张图片中，looting-I 丢失了，所以我们需要设置不同的值：

2nd Image第二张图片	Result结果
	dese eee _, Looting I dese eee _，抢劫我

The global thresholding won't work on those images. 全局阈值不适用于这些图像。

Adaptive thresholding is not more successful than inRange thresholding: 自适应阈值并不比 inRange 阈值更成功：

1st Image第一张图片	2nd Image第二张图片

As a result, you can find the ideal result by changing the values.因此，您可以通过更改值来找到理想的结果。

InRange threshold code: InRange 阈值代码：

import cv2
import pytesseract
from numpy import array

image_list = ["0fNhM.png", "J71v6.png"]

for image_name in image_list:
    bgr_image = cv2.imread(image_name)
    hsv_image = cv2.cv2.cvtColor(bgr_image, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv_image, array([0, 158, 233]), array([46, 255, 255]))
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 3))
    dilate = cv2.dilate(mask, kernel, iterations=1)
    thresh = 255 - cv2.bitwise_and(dilate, mask)
    txt = pytesseract.image_to_string(thresh, config='--psm 6')
    print(txt)

需要帮助找到 Pytesseract 的正确配置才能识别此文本

问题描述

2 个解决方案

解决方案1
0 2022-01-29 07:54:26

解决方案2
0 2022-01-29 14:34:36

需要帮助找到 Pytesseract 的正确配置才能识别此文本

问题描述

2 个解决方案

解决方案1 0 2022-01-29 07:54:26

解决方案2 0 2022-01-29 14:34:36

解决方案1
0 2022-01-29 07:54:26

解决方案2
0 2022-01-29 14:34:36