简体   繁体   English

如何使用Tesseract提高OCR质量?

[英]How to improve OCR quality using Tesseract?

I am extracting image matrices using opencv from a desktop application screenshot and using tesseract to read the cropped images. 我正在使用opencv从桌面应用程序屏幕截图中提取图像矩阵,并使用tesseract读取裁剪的图像。 For instance, if you refer to the images below, Tesseract extracts "Relationship" image to be R'e'auunshwp and member as Mamba! 例如,如果您参考下面的图像,则Tesseract提取“ Relationship”图像为R'e'auunshwp,成员为Mamba!

Is the quality of attaching image too low for tesseract? 粘贴图像的质量是否太低? What can I do to improve it? 我该怎么做才能改善它?

在此处输入图片说明 在此处输入图片说明

UPDATE 更新

I used the following code to resize image, which improved OCR quality. 我使用以下代码调整图像大小,从而提高了OCR质量。 But, how do I calculate what would be 300DPI and then how do I ensure that the aspect ratio of the image remains the same at the scale? 但是,如何计算300DPI,然后如何确保图像的纵横比在比例尺上保持相同?

    Mat resizedMat = new Mat();
    Size sz = new Size(mat.rows()*10,mat.cols()*10);
    Imgproc.resize(mat,resizedMat,sz);

Resolution is too low. 分辨率太低。 Try rescaling to 300DPI. 尝试重新缩放到300DPI。

I finally solved it with this code, using OpenCV - 我终于使用OpenCV将此代码解决了-

Mat resizedMat = new Mat();
double width = mat.cols();
double height = mat.rows();
double aspect = width / height;
Size sz = new Size(width * aspect * 2, height * aspect * 2);
Imgproc.resize(mat, resizedMat, sz);

As nguyenq said, you should rescale your image, because tesseract struggles to scan low quality images. 正如nguyenq所说,您应该重新缩放图像,因为tesseract难以扫描低质量的图像。

I answered a similar question HERE for another person, you should try to do the same. 我在这里为另一个人回答了类似的问题,您应该尝试执行相同的操作。 Increase your image size by 200-400%, if only this does not help, do some blurring and then threshold. 将图像尺寸增加200-400%,如果仅此操作无济于事,请先进行模糊处理,然后再进行阈值处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM