How can I run tesseract with multiple languages one time?

Question

I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default ( -l eng ), some Japanese characters lost. Otherwise, if I run tesseract with japanese ( -l jpn ) some English characters lost (eg Email).

How can I run one process which recognize both English and Japanese characters?

Answer 1

Since tesseract 3.02 it is possible to specify multiple languages for the -l parameter.

-l lang The language to use. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters. Tesseract uses 3-character ISO 639-2 language codes.

An example:

tesseract myscan.png out -l deu+eng

Answer 2

Try this:

custom_config = r'-l eng+jpn --psm 6'
txt = pytesseract.image_to_string(img, config=custom_config)

from langdetect import detect_langs
detect_langs(txt)

Note: you have to install langdetect by using:

 pip install langdetect

How can I run tesseract with multiple languages one time?

Question

2 answers

solution1
44 ACCPTED 2014-12-22 12:36:53

solution2
2 2020-10-15 07:34:12

How can I run tesseract with multiple languages one time?

Question

2 answers

solution1 44 ACCPTED 2014-12-22 12:36:53

solution2 2 2020-10-15 07:34:12

solution1
44 ACCPTED 2014-12-22 12:36:53

solution2
2 2020-10-15 07:34:12