简体   繁体   English

为什么 pytesseract 不能处理 OSD 模式?

[英]Why pytesseract can't handle OSD mode?

I cant run OSD mode in pytesseract on docker image on Ubuntu. On windows, this command works like charm:我无法在Ubuntu上的 docker 图像上的 pytesseract 中运行OSD模式。在 windows 上,此命令非常有效:

pytesseract.image_to_osd(image)

But inside docker image, it causes the following error.但是在 docker 图像内部,它会导致以下错误。 What I want to achieve is reading the rotation info using OSD.我想要实现的是使用 OSD 读取旋转信息。

File "/usr/local/lib/python3.9/site-packages/pytesseract/pytesseract.py", line 263, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v5.0.0-alpha-20210401 with Leptonica UZN file /tmp/tess__cujlspf loaded. Estimating resolution as 169 UZN file /tmp/tess__cujlspf loaded. Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')

Tesseract is installed correctly because all other methods like image_to_string are working properly. Tesseract 已正确安装,因为所有其他方法(如 image_to_string)均正常工作。 The suprising thing is that when I call the OSD directly from terminal, it works令人惊讶的是,当我直接从终端调用 OSD 时,它起作用了

tesseract /images/1.jpg  output --psm 0
# cat output.osd
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 5.69
Script: Cyrillic
Script confidence: 0.10

Is there some bug in Pytesseract or any workaround? Pytesseract 中是否存在错误或有任何解决方法? The rotation info is not included in any other Tesseract methods, only in this OSD.旋转信息不包含在任何其他 Tesseract 方法中,仅包含在此 OSD 中。 Many thanks非常感谢

I found a solution for this by adding the config arguments to the method call:我通过将配置 arguments 添加到方法调用中找到了解决方案:

pytesseract.image_to_osd(file_name,config='--psm 0 -c min_characters_to_try=5')

This solves the error and I could get the angle data.这解决了错误,我可以获得角度数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM