[英]How to pass OpenCV image to Tesseract in python?
Given Python code invoking Tesseract`s C API and using ctypes library, in the Option #1 image is being loaded by Tesseract and it works fine! 鉴于Python代码调用Tesseract的C API并使用ctypes库, 选项#1图像由Tesseract加载,它工作正常! The problem is in the Option #2 , when I try to pass image loaded by OpenCV the Tesseract returns garbage: 问题出现在选项#2中 ,当我尝试传递OpenCV加载的图像时,Tesseract返回垃圾:
from ctypes import *
import cv2
class API(Structure):
_fields_ = []
lang = "eng"
ts = cdll.LoadLibrary("c:/Tesseract-OCR/libtesseract302.dll")
ts.TessBaseAPICreate.restype = POINTER(API)
api = ts.TessBaseAPICreate()
rc = ts.TessBaseAPIInit3(api, 'c:/Tesseract-OCR/', lang)
##### Option #1
out = ts.TessBaseAPIProcessPages(api, 'c:/Tesseract-OCR/doc/eurotext.tif', None, 0)
print 'Option #1 => ' + string_at(out)
##### Option #2
#TESS_API void TESS_CALL TessBaseAPISetImage(TessBaseAPI* handle, const unsigned char* imagedata, int width, int height,
# int bytes_per_pixel, int bytes_per_line);
im = cv2.imread('c:/Temp/Downloads/test-slim/eurotext.jpg', cv2.COLOR_BGR2GRAY)
c_ubyte_p = POINTER(c_ubyte)
##ts.TessBaseAPISetImage.argtypes = [POINTER(API), c_ubyte_p, c_int, c_int, c_int, c_int]
ts.TessBaseAPISetImage(api, im.ctypes.data_as(c_ubyte_p), 800, 1024, 3, 800 * 3)
out = ts.TessBaseAPIGetUTF8Text(api)
print 'Option #2 => ' + string_at(out)
and output is as follows: 输出如下:
Option #1 => The (quick) [brown] {fox} jumps! 选项#1 =>(快速)[棕色] {狐狸}跳! Over the $43,456.78 #90 dog & duck/goose, as 12.5% of E-mail from aspammer@website.com is spam. 超过$ 43,456.78#90 dog&duck / goose,因为来自aspammer@website.com的12.5%的电子邮件是垃圾邮件。 Der ,,schnelle†braune Fuchs springt ï¬ ber den faulen Hund. Der ,, schnelle'braune Fuchsspringtï¬ berdenfaulen Hund。 Le renard brun «rapide» saute par-dessus le chien paresseux. Le renard brun«rapide»saute par-dessus le chien paresseux。 La volpe marrone rapida salta sopra il cane pigro. La volpe marone rapida salta sopra il cane pigro。 El zorro marrén répido salta sobre el perro perezoso. Elzorromarrénrépidosalta sobre el perro perezoso。 A raposa marrom rzipida salta sobre o cï¬ o preguicoso. 一种raposa marrom rzipida salta sobreocï preguicoso。
Option #2 => 7?:5:*:>\\—‘- ;2—;i3E:?:;i3".i: ii‘; 3;’ f-ié% :::’::;?:=«’:: =£<:7‘i§5.< :—'\\—;:=é:’—..=.:a,';2’:3‘ :3_3:l.':—‘:—:£€:-_’:§3;;%§%ai5~«:é::3%ia»€E: 选项#2 => 7?:5:*:> \\' - 〜; 2 - ; i3E:?:; i3“。i:ii ... 3;”f-i©% :::â€:::?:=’:: =£<:7“§5。<:“¡¡¡¡¡¡¡¡¡ “...... =:a,'; 2â€:3†:3_3:l。':“:“:Â:“:-_:§ 3 ;;%§%AI5〜一«:é:: 3%的IAA»â,¬E:
Remarks: 备注:
Any ideas how to pass OpenCV image (which is numpy.ndarray) to Tesseract? 有任何想法如何将OpenCV图像(numpy.ndarray)传递给Tesseract? Any help would be useful. 任何帮助都会有用。
I use this with python 3: (bw_img is a numpy.ndarray) 我用python 3 :( bw_img是一个numpy.ndarray)
import numpy as np
import cv2
from PIL import Image
import pytesseract
...
(thresh, bw_img) = cv2.threshold(bw_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
...
img = Image.fromarray(bw_img)
txt = pytesseract.image_to_string(img)
print(txt)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.