简体   繁体   中英

Tesseract error in image_to_string() conversion: ytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

PLEASE NOTE: I understand there are many posts about Tesseract. I have not yet found a working solution that does not produce errors.

I am trying to simply use the OCR on an image with Tesseract. I have tried numerous solutions across various forums and have not been successful. I have converted a pdf to an image and saved said image. I then have called this image using cv2. I have been about to show the image as well. Now, I am trying to apply the image_to_string() command from Tesseract.

I have tried adjusting the pytesseract.pytesseract.tesseract_cmd and made sure that both the wrapper and true tesseract package are installed. Here is the code:

from wand.image import Image
import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:/Users/Afton/anaconda3/Scripts/pytesseract.exe'


# Convert from pdf and save as image
pdf = 'C:/path/example.pdf'
outputFilename = 'C:/path/example.jpg'

with Image(filename=pdf) as img:
    img.save(filename=outputFilename)

# Read image
imagePath = outputFilename
image = cv2.imread(imagePath)    

# Configure OCR with pytesseract
config = r'-l deu --oem 1 --psm 3'
text = pytesseract.image_to_string(image, config=config)

# Print text output
text = text.split('\n')
print(text)

This is the current error:

pytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

Before, the error was related to the pytesseract.pytesseract.tesseract_cmd input.

Any help is appreciated.

Updated: the image is in German. I have tried to clarify this in the configuration.

Update2: I tried an alternative path from this resource (with my file location)

pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe' 

I now get this error:

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Program Files\\Tesseract-OCR/tessdata/deu.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'deu\' Tesseract couldn\'t load any languages! Could not initialize tesseract.') 

Note for others with this problem: Downloaded the language package from https://github.com/tesseract-ocr/tessdata because I am reading a German document. All language files are available here. The issue was of the language variety.

This line is wrong:

pytesseract.pytesseract.tesseract_cmd = r'C:/Users/Afton/anaconda3/Scripts/pytesseract.exe'

Please read pytesseract documentation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM