简体   繁体   English

如何使用pytesseract提高图像识别的可能性

[英]How to increase likeliness of image recognition with pytesseract

I'm trying to convert this list of images I have to text. 我正在尝试将此图像列表转换为文本。 The images are fairly small but VERY readable (15x160, with only grey text and a white background) I can't seem to get pytesseract to read the image properly. 图像很小,但是非常可读(15x160,只有灰色文本和白色背景),我似乎无法pytesseract正确读取图像。 I tried to increase the size with .resize() but it didn't seem to do much at all. 我试图通过.resize()来增加大小,但是它似乎并没有做什么用。 Here's some of my code. 这是我的一些代码。 Anything new I can add to increase my chances? 有什么我可以添加以增加机会的新东西吗? Like I said, I'm VERY surprised that pytesseract is failing me here, it's small but super readable compared to some of the things I've seem it catch. 就像我说的那样,我很惊讶pytesseract在这里使我失望,它虽然小巧但却比我似乎发现的某些东西超级可读。

for dImg in range(0, len(imgList)):
    url = imgList[dImg]
    local = "img" + str(dImg) + ".jpg"
    urllib.request.urlretrieve(url, local)
    imgOpen = Image.open(local)
    imgOpen.resize((500,500))
    imgToString = pytesseract.image_to_string(imgOpen)
    newEmail.append(imgToString)

Setting the Page Segmentation Mode (psm) can probably help. 设置页面分割模式(psm)可能会有所帮助。

To get all the available psm enter tesseract --help-psm in your terminal. 要获取所有可用tesseract --help-psm ,请在终端中输入tesseract --help-psm

Then identify the psm corresponding to your need. 然后根据您的需求确定psm。 Lets say you want to treat the image as a single text line, in that case your ImgToString becomes: 假设您要将图像视为单个文本行,在这种情况下,您的ImgToString变为:

imgToString = pytesseract.image_to_string(imgOpen, config = '--psm 7')

Hope this will help you. 希望这会帮助你。

You can perform several pre-processing steps in your code. 您可以在代码中执行几个预处理步骤。

1) Use the from PIL import Image and use your_img.convert('L') . 1)使用from PIL import Image并使用your_img.convert('L') There are several other settings you can check. 您还可以检查其他几种设置。

2) A bit advanced method: Use a CNN. 2)一种高级方法:使用CNN。 There are several pre-trained CNNs you can use. 您可以使用几种预先训练的CNN。 Here you can find a little bit more detailed information: https://www.cs.princeton.edu/courses/archive/fall00/cs426/lectures/sampling/sampling.pdf 在这里,您可以找到更多详细信息: https : //www.cs.princeton.edu/courses/archive/fall00/cs426/lectures/sampling/sampling.pdf

tifi tifi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM