[英]How can I run tesseract with multiple languages one time?
I have to analyzed a image which containing both English and Japanese texts.我必须分析包含英文和日文文本的图像。 When I run tesseract by default (
-l eng
), some Japanese characters lost.当我默认运行 tesseract (
-l eng
) 时,一些日语字符丢失了。 Otherwise, if I run tesseract with japanese ( -l jpn
) some English characters lost (eg Email).否则,如果我用日语(
-l jpn
)运行 tesseract ,一些英文字符会丢失(例如电子邮件)。
How can I run one process which recognize both English and Japanese characters?如何运行一个同时识别英文和日文字符的进程?
Since tesseract 3.02 it is possible to specify multiple languages for the -l parameter.从 tesseract 3.02 开始,可以为 -l 参数指定多种语言。
-l lang The language to use.
-l lang 要使用的语言。 If none is specified, English is assumed.
如果没有指定,则假定为英语。 Multiple languages may be specified, separated by plus characters.
可以指定多种语言,用加号分隔。 Tesseract uses 3-character ISO 639-2 language codes.
Tesseract 使用 3 个字符的 ISO 639-2 语言代码。
An example:一个例子:
tesseract myscan.png out -l deu+eng
Try this:尝试这个:
custom_config = r'-l eng+jpn --psm 6'
txt = pytesseract.image_to_string(img, config=custom_config)
from langdetect import detect_langs
detect_langs(txt)
Note: you have to install langdetect by using:注意:您必须使用以下命令安装 langdetect:
pip install langdetect
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.