如何在Python中将OpenCV图像传递给Tesseract？

Question

Given Python code invoking Tesseract`s C API and using ctypes library, in the Option #1 image is being loaded by Tesseract and it works fine! 鉴于Python代码调用Tesseract的C API并使用ctypes库， 选项＃1图像由Tesseract加载，它工作正常！ The problem is in the Option #2 , when I try to pass image loaded by OpenCV the Tesseract returns garbage: 问题出现在选项＃2中 ，当我尝试传递OpenCV加载的图像时，Tesseract返回垃圾：

from ctypes import *
import cv2

class API(Structure):
    _fields_ = []

lang = "eng"
ts = cdll.LoadLibrary("c:/Tesseract-OCR/libtesseract302.dll")
ts.TessBaseAPICreate.restype = POINTER(API)
api = ts.TessBaseAPICreate()
rc = ts.TessBaseAPIInit3(api, 'c:/Tesseract-OCR/', lang)

##### Option #1
out = ts.TessBaseAPIProcessPages(api, 'c:/Tesseract-OCR/doc/eurotext.tif', None, 0)
print 'Option #1 => ' + string_at(out)

##### Option #2
#TESS_API void  TESS_CALL TessBaseAPISetImage(TessBaseAPI* handle, const unsigned char* imagedata, int width, int height,
#                                             int bytes_per_pixel, int bytes_per_line);

im = cv2.imread('c:/Temp/Downloads/test-slim/eurotext.jpg', cv2.COLOR_BGR2GRAY)
c_ubyte_p = POINTER(c_ubyte)
##ts.TessBaseAPISetImage.argtypes = [POINTER(API), c_ubyte_p, c_int, c_int, c_int, c_int]
ts.TessBaseAPISetImage(api, im.ctypes.data_as(c_ubyte_p), 800, 1024, 3, 800 * 3)
out = ts.TessBaseAPIGetUTF8Text(api)
print 'Option #2 => ' + string_at(out)

and output is as follows: 输出如下：

Option #1 => The (quick) [brown] {fox} jumps! 选项＃1 =>（快速）[棕色] {狐狸}跳！ Over the $43,456.78 #90 dog & duck/goose, as 12.5% of E-mail from aspammer@website.com is spam. 超过$ 43,456.78＃90 dog＆duck / goose，因为来自aspammer@website.com的12.5％的电子邮件是垃圾邮件。 Der ,,schnelleâ€ braune Fuchs springt ï¬ ber den faulen Hund. Der ,, schnelle'braune Fuchsspringtï¬ berdenfaulen Hund。 Le renard brun Â«rapideÂ» saute par-dessus le chien paresseux. Le renard brun«rapide»saute par-dessus le chien paresseux。 La volpe marrone rapida salta sopra il cane pigro. La volpe marone rapida salta sopra il cane pigro。 El zorro marrÃ©n rÃ©pido salta sobre el perro perezoso. Elzorromarrénrépidosalta sobre el perro perezoso。 A raposa marrom rzipida salta sobre o cï¬ o preguicoso. 一种raposa marrom rzipida salta sobreocï preguicoso。

Option #2 => 7?:5:*:>\\â€”â€˜- ;2â€”;i3E:?:;i3".i: iiâ€˜; 3;â€™ f-iÃ©% :::â€™::;?:=Â«â€™:: =Â£<:7â€˜iÂ§5.< :â€”'\\â€”;:=Ã©:â€™â€”..=.:a,';2â€™:3â€˜ :3_3:l.':â€”â€˜:â€”:Â£â‚¬:-_â€™:Â§3;;%Â§%ai5~Â«:Ã©::3%iaÂ»â‚¬E: 选项＃2 => 7？：5：*：> \\' - 〜; 2 - ; i3E：？：; i3“。i：ii ... 3;”f-i©％ :::â€:::？：=â€™:: =Â£<：7â€œ§5。<：â€œ¡¡¡¡¡¡¡¡¡ â€œ...... =：a，'; 2â€：3â€ ：3_3：l。'：â€œ：â€œ：Â：â€œ：-_：Â§ 3 ;;％Â§％AI5〜一«：Ã©:: 3％的IAA»â,¬E：

Remarks: 备注：

I tried python-tesseract and tightocr libraries, which are good 我试过python-tesseract和tightocr库，这很好
enough, but lacking documentation 足够，但缺乏文件
here I use opencv.imread in order to have possibility to apply math algorithms on matrix 这里我使用opencv.imread，以便有可能在矩阵上应用数学算法

Any ideas how to pass OpenCV image (which is numpy.ndarray) to Tesseract? 有任何想法如何将OpenCV图像（numpy.ndarray）传递给Tesseract？ Any help would be useful. 任何帮助都会有用。

Answer 1

I use this with python 3: (bw_img is a numpy.ndarray) 我用python 3 :( bw_img是一个numpy.ndarray）

import numpy as np
import cv2
from PIL import Image
import pytesseract

...

(thresh, bw_img) = cv2.threshold(bw_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
...

img = Image.fromarray(bw_img)
txt = pytesseract.image_to_string(img)
print(txt)

如何在Python中将OpenCV图像传递给Tesseract？

问题描述

1 个解决方案

解决方案1
13 2016-10-29 17:58:37

如何在Python中将OpenCV图像传递给Tesseract？

问题描述

1 个解决方案

解决方案1 13 2016-10-29 17:58:37

解决方案1
13 2016-10-29 17:58:37