简体   繁体   English

仅从图像 opencv 或 OCR 中提取字符

[英]Extract only characters from a image opencv or OCR

From a group of text like below来自如下一组文本在此处输入图像描述

I want to MAKE a BOUNDING BOX on INDIVIDUAL CHARACTER .我想在 INDIVIDUAL CHARACTER上制作一个边界框。 However, I am unable to do so.但是,我无法这样做。

I've tried to use Easy OCR with following settings but it only recognizes individual words:我尝试使用具有以下设置的 Easy OCR,但它只能识别单个单词:

reader = eo.Reader(['en'],gpu=True)
result = reader.readtext(imgOriginal,y_ths=0.0000000001,x_ths=0.0000000001,paragraph=False)

I tried to set psm/oem in tesserocr/pytesserocr but still I wasn't able to get the individual character.我试图在 tesserocr/pytesserocr 中设置 psm/oem 但我仍然无法获得单个字符。 Please Help.请帮忙。

Have a look at GetComponentImage example from tesserocr and adapt it:查看 tesserocr 中的GetComponentImage 示例并对其进行调整:

from PIL import Image, ImageOps
from tesserocr import PyTessBaseAPI, RIL

image = ImageOps.grayscale(Image.open('test.png'))).convert('L')
with PyTessBaseAPI(path=tessdata_path, psm=tesserocr.PSM.SPARSE_TEXT) as api:
    api.SetImage(image)
    api.Recognize()
    boxes = api.GetComponentImages(RIL.SYMBOL, True)
    print('Found {} symbol image components.'.format(len(boxes)))
    for i, (im, box, _, _) in enumerate(boxes):
        print("Box[{0}]: x={x}, y={y}, w={w}, h={h}".format(i, **box))
        # display(im)

If boxes are not accurate try to use oem=tesserocr.OEM.TESSERACT_ONLY with correct trainneddata.如果框不准确,请尝试将oem=tesserocr.OEM.TESSERACT_ONLY与正确的训练数据一起使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM