简体   繁体   English

Pytesseract 真的很慢

[英]Pytesseract really slow

so I'm trying to read out text from MS Teams and use that text to make inputs on the keyboard.所以我试图从MS Teams中读出文本并使用该文本在键盘上进行输入。 Right now, I work with the threading module to have one thread for the input and one thread for the image_to_string.现在,我使用线程模块让一个线程用于输入,一个线程用于 image_to_string。 Following is the function for the image_to_string.以下是 image_to_string 的函数。

def imToString():
    global message
    print("Image getting read")
    pytesseract.pytesseract.tesseract_cmd ='C:\\Users\\gornicec\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
    while(True):
        print("preIMGgrab")
        
        cap = ImageGrab.grab(bbox=(177, 850, 283, 881))   
        grayCap = cv2.cvtColor(np.array(cap), cv2.COLOR_BGR2GRAY)
        
        print("postIMGgrab")
        t = time.perf_counter()
        print("preMSG" + str(t))

        message = pytesseract.image_to_string(
                grayCap,
                lang ='deu',config='--psm 6')
   
        print(str(message) + "was read" + str(time.perf_counter() - t))

I don't know how but it takes about 8 seconds to read an image thats 1000 pixels big.我不知道如何读取一张 1000 像素大的图像大约需要 8 秒。 I need this to be at highest 2 seconds.我需要这个时间达到最高 2 秒。 I'll add the whole code at the end.我将在最后添加整个代码。 If there is any way to make it faster or to do it differently please tell me so.如果有任何方法可以使其更快或以不同的方式进行,请告诉我。

WHOLE CODE:完整代码:

import numpy as np
import time
import pytesseract
from win32gui import GetWindowText, GetForegroundWindow
import win32api
import cv2
import pyautogui
from PIL import ImageGrab
import threading
from ahk import AHK
import keyboard

message = ""
ahk = AHK(executable_path='C:\\Program Files\\AutoHotkey\\AutoHotkey.exe')

def Controls():
    global message
    while True:
        booleanVal = True
        if booleanVal:
            #imToString()
            print("message")
            #print("rechts" in message.lower())
            #print(f'LÄNGE: {len(message)}')
            if "vorne" in message.lower():
                # Control(message, 'w')
                ahk.key_press('w')
                #message = ""

            if "hinten" in message.lower():
                # Control(message, 's')
                ahk.key_press('s')
                #message = ""

            if "links" in message.lower():
                # Control(message, 'a')
                ahk.key_press('a')
                #message = ""

            if "rechts" in message.lower():
                # Control(message, 'd')
                #print("HAHAHA")
                ahk.key_press('d')
                #message = ""

            if "greif" in message.lower():
                ahk.key_press('space')
                #message = ""
            time.sleep(0.5)

#IMGTOSTRING---

controls = threading.Thread(target=Controls)
controls.start()
grab = threading.Thread(target=imToString)
grab.start()

pytesseract is not suit for large amount of images or images that are already in memory, its write them to a file and then pass the file path to tesseract cli, if you want to improve the performance of you script try using library that works directly with tesseract api. pytesseract 不适合大量图像或已经在内存中的图像,它将它们写入文件,然后将文件路径传递给 tesseract cli,如果您想提高脚本的性能,请尝试使用直接使用的库正方体 API。

like this: https://pypi.org/project/tess-py-api/像这样: https ://pypi.org/project/tess-py-api/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM