简体   繁体   中英

How to use tessdata_fast in pytesseract (python)?

I am currently trying to use the Tesseract OCR engine in python on macOS to detect the orientation of text (using image_to_osd).

It currently takes a long time to detect the orientation (300ms), so my aim is to decrease this time. I am trying to use the data set of tessdata_fast, as I believe this would help reduce the time and I am not too concerned about accuracy.

I have used this link: https://github.com/tesseract-ocr/tessdata_fast to download the eng.traineddata and the osd.traineddata in a tessdata_fast folder and added it to the tesseract folder. I have tried to customise the configuration as custom_config = r'--oem 1 --tessdata-dir /usr/local/Cellar/tesseract/5.0.1/share/tessdata_fast --psm 0' . However, the time taken does not seem to decrease, so I am unsure if my configuration is running tessdata_fast or the tessdata previously downloaded.

I have checked the command tesseract --list-langs and it seemed to be reading the tessdata :

"/usr/local/share/tessdata/" (2):
eng
osd

I have tried to delete the previously downloaded tessdata and run the command again but the result is "/usr/local/share/tessdata/" (0):

Does anyone know where I am going wrong? Or what steps should I be taking to run pytesseract with tessdata_fast?

Thank you!

According to the documentation of pytesseract, there is the argument --tessdata-dir of tesseract and specify the path of your data. Then, add it to the config of pytesseract, as follows:

# Example config: r'--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata"'
# It's important to add double quotes around the dir path.
tessdata_dir_config = r'--tessdata-dir "<replace_with_your_tessdata_dir_path>"'
pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)

For more details see https://pypi.org/project/pytesseract/ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM