简体   繁体   English

Python OCR Tesseract 无法识别单个字符

[英]Python OCR Tesseract cannot recognize Single Characters

I have two TIF images.我有两个 TIF 图像。 First Image(a.tif) is:第一张图片(a.tif)是:

单字符图像

and Second Image(bcd.tif) is和第二个图像(bcd.tif)是

多字符图像

When I am using "tesseract a.tif a.txt" it is not reading that Character and The same command "tesseract bcd.tif bcd.txt" is working.I have seen some answers in stackoverflow they they didn't gave solution how to run that.If we need to add any parameters what are those?当我使用“tesseract a.tif a.txt”时,它没有读取该字符,并且相同的命令“tesseract bcd.tif bcd.txt”正在运行。我在stackoverflow中看到了一些答案,他们没有给出解决方案运行那个。如果我们需要添加任何参数,那些是什么?

正如您所说,您需要将模式更改为单字符模式,您可以使用以下命令在 python 中执行此操作

pytesseract.image_to_string(img_path , config="--psm 10") 

Seems like the issue has something to do with there being only a single character in the image.似乎这个问题与图像中只有一个字符有关。 For instance I tried these two images:例如,我尝试了这两个图像:

This one works fine.这个很好用。 Tesseract reports 95% confidence in the result: Tesseract 报告对结果的置信度为 95%:

在此处输入图片说明

This one doesn't work.这个不行。

在此处输入图片说明

I also tried scanning that image with PageSegMode set to SingleChar, and then it is scanned fine.我还尝试在 PageSegMode 设置为 SingleChar 的情况下扫描该图像,然后扫描正常。

The command line argument for that should be -psm 10 .命令行参数应该是-psm 10 See this: https://stackoverflow.com/a/26418458/5894241看到这个: https : //stackoverflow.com/a/26418458/5894241

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM