TypeError：需要一個類似字節的對象，而不是python 3.5.2和pytesseract中的“ str”

Question

我正在使用python 3.5.2和pytesseract，有一個TypeError: a bytes-like object is required, not 'str'錯誤TypeError: a bytes-like object is required, not 'str'運行我的代碼時TypeError: a bytes-like object is required, not 'str' （詳細信息）：

代碼： File "D:/test.py"

# -*- coding: utf-8 -*-

try:
    import Image
except ImportError:
    from PIL import Image

import pytesseract


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))

錯誤：

Traceback (most recent call last):
  File "D:/test.py", line 11, in <module>
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
    errors = get_errors(error_string)
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr>
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
TypeError: a bytes-like object is required, not 'str'

我該怎么辦？

編輯：

我已經將訓練數據下載到C:\\Program Files (x86)\\Tesseract-OCR\\tessdata ，如下所示：

我將行error_string = error_string.decode("utf-8")插入get_errors() ，錯誤是這樣的：

Traceback (most recent call last):
  File "D:/test.py", line 11, in <module>
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
    raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

Answer 1

這是pytesseract中的一個已知錯誤，請參閱問題＃32 ：

tesseract輸出的錯誤解析很脆弱：需要一個類似字節的對象，而不是'str'

和

tesseract實際上存在錯誤。 但是在Python端會發生錯誤，因為error_string返回一個字節字面量，並且geterrors調用似乎對此有問題

解決方法是安裝給定語言的培訓數據，請參閱Tesseract運行錯誤，或通過編輯site-packages\\pytesseract\\pytesseract.py並在get_errors()函數頂部插入額外的一行（在第109行）：

error_string = error_string.decode("utf-8")

然后該函數顯示為：

def get_errors(error_string):
    '''
    returns all lines in the error_string that start with the string "error"
    '''

    error_string = error_string.decode("utf-8")
    lines = error_string.splitlines()
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
    if len(error_lines) > 0:
        return '\n'.join(error_lines)
    else:
        return error_string.strip()

TypeError：需要一個類似字節的對象，而不是python 3.5.2和pytesseract中的“ str”

問題描述

1 個解決方案

解決方案1
0 2016-12-28 19:51:22

TypeError：需要一個類似字節的對象，而不是python 3.5.2和pytesseract中的“ str”

問題描述

1 個解決方案

解決方案1 0 2016-12-28 19:51:22

解決方案1
0 2016-12-28 19:51:22