简体   繁体   English

如何提高Pytesseract提取数字的准确性

[英]How to increase Pytesseract's accuracy in extracting digits

I am testing Pytesseract, and use it to extract digits like the one below. 我正在测试Pytesseract,并用它来提取数字,如下所示。

在此处输入图片说明

The image is of fairly decent quality (200 dpi). 图像质量相当不错(200 dpi)。 However, when I run pytesseract, it gives me the result 456-/8-0000 , where the digit 7 is misrecognized as '/'. 但是,当我运行pytesseract时,结果为456- / 8-0000 ,其中数字7被误识别为'/'。 While "/" obviously bears some resemblance to the digit 7, given the high quality of the image, I am still surprised by it. 尽管“ /”显然与数字7相似,但鉴于图像的高质量,我仍然对此感到惊讶。

I tried both 我都尝试过

pytesseract.image_to_string(img)

and

pytesseract.image_to_string(img, lang='eng', config='--psm 13 --oem 2 -c tessedit_char_whitelist=0123456789-')

both yielded the same result. 两者都产生了相同的结果。

Any pointer in how to improve the accuracy of recognition would be great. 任何如何提高识别准确性的指标都将是很棒的。 Thanks! 谢谢!

Which version of tesseract you use. 您使用哪个版本的tesseract。 Which tessdata? 哪个tessdata? With recent tesseract and eng from tessdata-best result is perfect: 从tessdata和最近的tesseract和eng来看,最好的结果是完美的:

> tesseract 0mIe5.png  - quiet
456-78-0000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM