如何提高Pytesseract提取数字的准确性

Question

I am testing Pytesseract, and use it to extract digits like the one below. 我正在测试Pytesseract，并用它来提取数字，如下所示。

The image is of fairly decent quality (200 dpi). 图像质量相当不错（200 dpi）。 However, when I run pytesseract, it gives me the result 456-/8-0000 , where the digit 7 is misrecognized as '/'. 但是，当我运行pytesseract时，结果为456- / 8-0000 ，其中数字7被误识别为'/'。 While "/" obviously bears some resemblance to the digit 7, given the high quality of the image, I am still surprised by it. 尽管“ /”显然与数字7相似，但鉴于图像的高质量，我仍然对此感到惊讶。

I tried both 我都尝试过

pytesseract.image_to_string(img)

and 和

pytesseract.image_to_string(img, lang='eng', config='--psm 13 --oem 2 -c tessedit_char_whitelist=0123456789-')

both yielded the same result. 两者都产生了相同的结果。

Any pointer in how to improve the accuracy of recognition would be great. 任何如何提高识别准确性的指标都将是很棒的。 Thanks! 谢谢！

Answer 1

Which version of tesseract you use. 您使用哪个版本的tesseract。 Which tessdata? 哪个tessdata？ With recent tesseract and eng from tessdata-best result is perfect: 从tessdata和最近的tesseract和eng来看，最好的结果是完美的：

> tesseract 0mIe5.png  - quiet
456-78-0000

如何提高Pytesseract提取数字的准确性

问题描述

1 个解决方案

解决方案1
0 2019-07-05 19:15:32

如何提高Pytesseract提取数字的准确性

问题描述

1 个解决方案

解决方案1 0 2019-07-05 19:15:32

解决方案1
0 2019-07-05 19:15:32