[英]Python OCR Tesseract cannot recognize Single Characters
I have two TIF images.我有两个 TIF 图像。 First Image(a.tif) is:
第一张图片(a.tif)是:
and Second Image(bcd.tif) is和第二个图像(bcd.tif)是
When I am using "tesseract a.tif a.txt" it is not reading that Character and The same command "tesseract bcd.tif bcd.txt" is working.I have seen some answers in stackoverflow they they didn't gave solution how to run that.If we need to add any parameters what are those?当我使用“tesseract a.tif a.txt”时,它没有读取该字符,并且相同的命令“tesseract bcd.tif bcd.txt”正在运行。我在stackoverflow中看到了一些答案,他们没有给出解决方案运行那个。如果我们需要添加任何参数,那些是什么?
正如您所说,您需要将模式更改为单字符模式,您可以使用以下命令在 python 中执行此操作
pytesseract.image_to_string(img_path , config="--psm 10")
Seems like the issue has something to do with there being only a single character in the image.似乎这个问题与图像中只有一个字符有关。 For instance I tried these two images:
例如,我尝试了这两个图像:
This one works fine.这个很好用。 Tesseract reports 95% confidence in the result:
Tesseract 报告对结果的置信度为 95%:
This one doesn't work.这个不行。
I also tried scanning that image with PageSegMode set to SingleChar, and then it is scanned fine.我还尝试在 PageSegMode 设置为 SingleChar 的情况下扫描该图像,然后扫描正常。
The command line argument for that should be -psm 10
.命令行参数应该是
-psm 10
。 See this: https://stackoverflow.com/a/26418458/5894241看到这个: https : //stackoverflow.com/a/26418458/5894241
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.