[英]TypeError: a bytes-like object is required, not 'str' in python 3.5.2 and pytesseract
I am using python 3.5.2
and pytesseract,there is an error TypeError: a bytes-like object is required, not 'str'
when I run my code,(details below): 我正在使用python 3.5.2
和pytesseract,有一个TypeError: a bytes-like object is required, not 'str'
错误TypeError: a bytes-like object is required, not 'str'
运行我的代码时TypeError: a bytes-like object is required, not 'str'
(详细信息):
code: File "D:/test.py"
代码: File "D:/test.py"
# -*- coding: utf-8 -*-
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))
error: 错误:
Traceback (most recent call last):
File "D:/test.py", line 11, in <module>
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
errors = get_errors(error_string)
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr>
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
TypeError: a bytes-like object is required, not 'str'
what should I do? 我该怎么办?
Edit: 编辑:
I have download the training data into C:\\Program Files (x86)\\Tesseract-OCR\\tessdata
,like this: 我已经将训练数据下载到C:\\Program Files (x86)\\Tesseract-OCR\\tessdata
,如下所示:
and I insert the line error_string = error_string.decode("utf-8")
into get_errors()
,the error is like this: 我将行error_string = error_string.decode("utf-8")
插入get_errors()
,错误是这样的:
Traceback (most recent call last):
File "D:/test.py", line 11, in <module>
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')
This is a known bug in pytesseract, see issue #32 : 这是pytesseract中的一个已知错误,请参阅问题#32 :
Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str' tesseract输出的错误解析很脆弱:需要一个类似字节的对象,而不是'str'
and 和
There actually is an error in tesseract. tesseract实际上存在错误。 But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it 但是在Python端会发生错误,因为error_string返回一个字节字面量,并且geterrors调用似乎对此有问题
The workaround is to install the training data for a given language, see Tesseract running error , or by editing site-packages\\pytesseract\\pytesseract.py
and insert an extra line at the top of the get_errors()
function (at line 109): 解决方法是安装给定语言的培训数据,请参阅Tesseract运行错误 ,或通过编辑site-packages\\pytesseract\\pytesseract.py
并在get_errors()
函数顶部插入额外的一行(在第109行):
error_string = error_string.decode("utf-8")
The function then reads: 然后该函数显示为:
def get_errors(error_string):
'''
returns all lines in the error_string that start with the string "error"
'''
error_string = error_string.decode("utf-8")
lines = error_string.splitlines()
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
if len(error_lines) > 0:
return '\n'.join(error_lines)
else:
return error_string.strip()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.