TypeError：需要一个类似字节的对象，而不是python 3.5.2和pytesseract中的“ str”

Question

I am using python 3.5.2 and pytesseract,there is an error TypeError: a bytes-like object is required, not 'str' when I run my code,(details below): 我正在使用python 3.5.2和pytesseract，有一个TypeError: a bytes-like object is required, not 'str'错误TypeError: a bytes-like object is required, not 'str'运行我的代码时TypeError: a bytes-like object is required, not 'str' （详细信息）：

code: File "D:/test.py" 代码： File "D:/test.py"

# -*- coding: utf-8 -*-

try:
    import Image
except ImportError:
    from PIL import Image

import pytesseract


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))

error: 错误：

Traceback (most recent call last):
  File "D:/test.py", line 11, in <module>
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
    errors = get_errors(error_string)
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr>
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
TypeError: a bytes-like object is required, not 'str'

what should I do? 我该怎么办？

Edit: 编辑：

I have download the training data into C:\\Program Files (x86)\\Tesseract-OCR\\tessdata ,like this: 我已经将训练数据下载到C:\\Program Files (x86)\\Tesseract-OCR\\tessdata ，如下所示：

and I insert the line error_string = error_string.decode("utf-8") into get_errors() ,the error is like this: 我将行error_string = error_string.decode("utf-8")插入get_errors() ，错误是这样的：

Traceback (most recent call last):
  File "D:/test.py", line 11, in <module>
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
    raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

Answer 1

This is a known bug in pytesseract, see issue #32 : 这是pytesseract中的一个已知错误，请参阅问题＃32 ：

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str' tesseract输出的错误解析很脆弱：需要一个类似字节的对象，而不是'str'

and 和

There actually is an error in tesseract. tesseract实际上存在错误。 But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it 但是在Python端会发生错误，因为error_string返回一个字节字面量，并且geterrors调用似乎对此有问题

The workaround is to install the training data for a given language, see Tesseract running error , or by editing site-packages\\pytesseract\\pytesseract.py and insert an extra line at the top of the get_errors() function (at line 109): 解决方法是安装给定语言的培训数据，请参阅Tesseract运行错误，或通过编辑site-packages\\pytesseract\\pytesseract.py并在get_errors()函数顶部插入额外的一行（在第109行）：

error_string = error_string.decode("utf-8")

The function then reads: 然后该函数显示为：

def get_errors(error_string):
    '''
    returns all lines in the error_string that start with the string "error"
    '''

    error_string = error_string.decode("utf-8")
    lines = error_string.splitlines()
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
    if len(error_lines) > 0:
        return '\n'.join(error_lines)
    else:
        return error_string.strip()

TypeError：需要一个类似字节的对象，而不是python 3.5.2和pytesseract中的“ str”

问题描述

1 个解决方案

解决方案1
0 2016-12-28 19:51:22

TypeError：需要一个类似字节的对象，而不是python 3.5.2和pytesseract中的“ str”

问题描述

1 个解决方案

解决方案1 0 2016-12-28 19:51:22

解决方案1
0 2016-12-28 19:51:22