[英]TypeError: a bytes-like object is required, not 'str' in python 3.5.2 and pytesseract
我正在使用python 3.5.2
和pytesseract,有一個TypeError: a bytes-like object is required, not 'str'
錯誤TypeError: a bytes-like object is required, not 'str'
運行我的代碼時TypeError: a bytes-like object is required, not 'str'
(詳細信息):
代碼: File "D:/test.py"
# -*- coding: utf-8 -*-
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))
錯誤:
Traceback (most recent call last):
File "D:/test.py", line 11, in <module>
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
errors = get_errors(error_string)
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr>
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
TypeError: a bytes-like object is required, not 'str'
我該怎么辦?
編輯:
我已經將訓練數據下載到C:\\Program Files (x86)\\Tesseract-OCR\\tessdata
,如下所示:
我將行error_string = error_string.decode("utf-8")
插入get_errors()
,錯誤是這樣的:
Traceback (most recent call last):
File "D:/test.py", line 11, in <module>
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')
這是pytesseract中的一個已知錯誤,請參閱問題#32 :
tesseract輸出的錯誤解析很脆弱:需要一個類似字節的對象,而不是'str'
和
tesseract實際上存在錯誤。 但是在Python端會發生錯誤,因為error_string返回一個字節字面量,並且geterrors調用似乎對此有問題
解決方法是安裝給定語言的培訓數據,請參閱Tesseract運行錯誤 ,或通過編輯site-packages\\pytesseract\\pytesseract.py
並在get_errors()
函數頂部插入額外的一行(在第109行):
error_string = error_string.decode("utf-8")
然后該函數顯示為:
def get_errors(error_string):
'''
returns all lines in the error_string that start with the string "error"
'''
error_string = error_string.decode("utf-8")
lines = error_string.splitlines()
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
if len(error_lines) > 0:
return '\n'.join(error_lines)
else:
return error_string.strip()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.