简体   繁体   English

无法识别 Pytesseract 上的正确数字

[英]Unable to recognize correct digit on Pytesseract

I am unable to recognize the digit I want using Pytesseract, is there anything I did wrong?我无法使用 Pytesseract 识别我想要的数字,我做错了什么吗?

My Code我的代码

import cv2
from pytesseract import pytesseract

img = cv2.imread('foo.png')
for i in range(6,14):
    text:str = pytesseract.image_to_string(img, config=f'--oem 3 --psm {i} digits').replace('\n','')

    print(f"psm {i}: {text}")

My Input Image我的输入图像
Image 1图片 1

Result结果

psm 6: 
psm 7: 
psm 8: 
psm 9: 
psm 10: 
psm 11: 
psm 12: 4
psm 13: 

Image 2图 2
Image 2图 2

Result结果

psm 6: .4
psm 7: 
psm 8: 
psm 9: 
psm 10: 
psm 11: 
psm 12: 
psm 13: 

Image 3图 3

Image 3图 3

Result结果

psm 6: 4
psm 7: 4
psm 8: 4
psm 9: 4
psm 10: 4
psm 11: 4
psm 12: 
psm 13: 4

How can I have the result that I want?我怎样才能得到我想要的结果? Thanks for helping.感谢您的帮助。

All images have a height of 252 pixels and minimum width of 240 pixels.所有图像的高度均为 252 像素,最小宽度为 240 像素。

Here are some things to try.这里有一些尝试。

You are using isolated digits, so there's no context to help the recognizer, no help from a dictionary.您使用的是孤立的数字,因此没有上下文可以帮助识别器,也没有字典的帮助。 Start with English sentences, then go down to English words, to verify things are working.从英文句子开始,然后 go 到英文单词,以验证一切正常。 Then try the harder task of isolated letters / numbers.然后尝试更难的孤立字母/数字任务。

Try running Gaussian blur over the image, threshold it to binary, and ask for recognition of that .尝试在图像上运行高斯模糊,将其阈值化为二进制,并要求对其进行识别 Or, almost the same thing, reduce "bumpy" artifacts by simply downsizing from 252 px to something smaller.或者,几乎是同一件事,通过简单地从 252 像素缩小到更小的东西来减少“颠簸”的伪影。 Remember that Tesseract was trained on 300 dpi and 600 dpi images of roughly 8 to 16 pt type.请记住,Tesseract 是在 300 dpi 和 600 dpi 的大约 8 到 16 pt 类型的图像上训练的。 Super large images can paradoxically be bad for recognition.矛盾的是,超大图像可能利于识别。

A few of your images look like they might be skewed by some non-zero theta.您的一些图像看起来可能会被一些非零的 theta 扭曲。 Consider deskewing.考虑去偏移。 Or better, consider generating ground truth images at various resolutions, which have zero skew.或者更好的是,考虑以各种分辨率生成地面实况图像,这些图像具有零偏斜。 Ghostscript is one popular way to achieve that. Ghostscript 是一种流行的实现方式。

Please update the question to explain which Ocr Engine Mode you're using.请更新问题以解释您使用的是哪种 Ocr 引擎模式。 Maybe 3 is OEM_TESSERACT_LSTM_COMBINED ?也许3是 OEM_TESSERACT_LSTM_COMBINED ? Are you sure you need to specify the option?确定需要指定选项吗? That is, do we see worse performance when we let it default?也就是说,当我们让它默认时,我们会看到更差的性能吗?

Wow, there sure are a lot of Page Segmentation Modes, As mentioned above.哇,确实有很多页面分割模式,如上所述。 you're not offering the engine much context, For isolated digits, if you write "1 2 3" in an image, or even "123".你没有为引擎提供太多上下文,对于孤立的数字,如果你在图像中写“1 2 3”,甚至是“123”。 the engine has a better chance to verify its estimate of font size than for your example single-digit image, So think about what particular PSMs are good at.与您的示例个位数图像相比,引擎有更好的机会验证其对字体大小的估计,因此请考虑特定的 PSM 擅长什么。 and take care to offer an image which plays to such strengths.并注意提供能够发挥这种优势的形象。 The estimate for descender and baseline becomes much better once we've seen a few adjacent characters.一旦我们看到一些相邻的字符,下行和基线的估计就会好得多。

Sorry, there are no easy answers.抱歉,没有简单的答案。 Looks like you have some experimentation ahead of you.看起来你前面有一些实验。 Please let us know what you discover!让我们知道您的发现!

I run tesseract from command line and I got this output:我从命令行运行 tesseract,得到这个 output:

>tesseract 7.png - --psm 8
7
>tesseract 3.png - --psm 8
3
>tesseract 9.png - --psm 8
9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM