简体   繁体   English

如何用汉字打印tesseract结果

[英]How to print tesseract result in chinese characters

I am trying get my program to recognize chinese using Tesseract, and it works. 我正在尝试让我的程序使用Tesseract识别中文,并且它可以正常工作。 The only problem that I am running into is that instread of printing the result as chinese characters, the result is bring printed in Pinyin(how you would type the chinese words as english). 我遇到的唯一问题是对结果打印为汉字的理解,结果以拼音打印(如何将中文单词键入英语)。

# Import libraries
from PIL import Image
import pytesseract
from unidecode import unidecode

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image_counter = 2

filelimit = image_counter - 1

outfile = "out_text.txt"

f = open(outfile, "a")

for i in range(1, filelimit + 1):
    print("ran")
    filename = "page_" + str(i) + ".png"

    # Recognize the text as string in image using pytesserct
    text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))

    print(text)

this is the image i ran 这是我跑过的图片

这是我跑过的图片

this is what I got 这就是我得到的

ran Qing Ming Shi Jie Yu Fen Fen , Lu Shang Xing Ren Yu Duan Que Xin Wen Jiu Jia He Chu You , Mu Yi Tong Zhi Qiang Hua Cun .

the result should be in the chinese character as shown in the image. 结果应为图片中的汉字。

Never mind, I realized my problem. 没关系,我意识到了我的问题。

text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))

should be 应该

text = pytesseract.image_to_string(Image.open(filename), lang = "chi_tra")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM