简体   繁体   English

python中的实时OCR

[英]Real time OCR in python

The problem问题

Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it.我试图用 OpenCV 捕获我的桌面并让 Tesseract OCR 查找文本并将其设置为变量,例如,如果我要玩游戏并且捕获帧超过资源量,我希望它打印并使用它。 A perfect example of this is a video by Micheal Reeves where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him.一个完美的例子是Micheal Reeves 的视频,每当他在游戏中失去健康时,它就会显示它并将其发送到他的蓝牙气枪来射击他。 So far I have this:到目前为止,我有这个:

# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os

fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))

while(True):
        x = 760
        y = 968

        ox = 50
        oy = 22

        # screen capture
        img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
        img_np = np.array(img)
        frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
        cv2.imshow("Screen", frame)
        out.write(frame)

        if cv2.waitKey(1) == 0:
                break

out.release()
cv2.destroyAllWindows()

it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.它实时捕获并将其显示在窗口中,但我不知道如何使其每帧识别文本并输出它。

any help?有什么帮助吗?

It's fairly simple to grab the screen and pass it to tesseract for OCRing.抓取屏幕并将其传递给tesseract进行 OCR 处理相当简单。

The PIL (pillow) library can grab the frames easily on MacOS and Windows. PIL(枕头)库可以在 MacOS 和 Windows 上轻松抓取帧。 However, this feature has only recently been added for Linux, so the code below works around it not existing.然而,这个功能最近才被添加到 Linux,所以下面的代码可以解决它不存在的问题。 (I'm on Ubuntu 19.10 and my Pillow does not support it). (我在 Ubuntu 19.10 上,我的 Pillow 不支持它)。

Essentially the user starts the program with screen-region rectangle co-ordinates.本质上,用户使用屏幕区域矩形坐标启动程序。 The main loop continually grabs this area of the screen, feeding it to Tesseract.主循环不断地抓取屏幕的这个区域,将它提供给 Tesseract。 If Tesseract finds any non-whitespace text in that image, it is written to stdout .如果 Tesseract 在该图像中找到任何非空白文本,则将其写入stdout

Note that this is not a proper Real Time system.请注意,这不是一个合适的实时系统。 There is no guarantee of timeliness, each frame takes as long as it takes.没有及时性的保证,每一帧都需要多久。 Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.您的机器可能获得 60 FPS,也可能获得 6 FPS。这也会受到您要求它监视的矩形大小的很大影响。

#! /usr/bin/env python3

import sys
import pytesseract
from PIL import Image

# Import ImageGrab if possible, might fail on Linux
try:
    from PIL import ImageGrab
    use_grab = True
except Exception as ex:
    # Some older versions of pillow don't support ImageGrab on Linux
    # In which case we will use XLib 
    if ( sys.platform == 'linux' ):
        from Xlib import display, X   
        use_grab = False
    else:
        raise ex


def screenGrab( rect ):
    """ Given a rectangle, return a PIL Image of that part of the screen.
        Handles a Linux installation with and older Pillow by falling-back
        to using XLib """
    global use_grab
    x, y, width, height = rect

    if ( use_grab ):
        image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
    else:
        # ImageGrab can be missing under Linux
        dsp  = display.Display()
        root = dsp.screen().root
        raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
        image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
        # DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
    return image


### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
    EXE = sys.argv[0]
    del( sys.argv[0] )

    # EDIT: catch zero-args
    if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ):  # some minor help
        sys.stderr.write( EXE + ": monitors section of screen for text\n" )
        sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
        sys.exit( 1 )

    # TODO - add error checking
    x      = int( sys.argv[0] )
    y      = int( sys.argv[1] )
    width  = int( sys.argv[2] )
    height = int( sys.argv[3] )

    # Area of screen to monitor
    screen_rect = [ x, y, width, height ]  
    print( EXE + ": watching " + str( screen_rect ) )

    ### Loop forever, monitoring the user-specified rectangle of the screen
    while ( True ): 
        image = screenGrab( screen_rect )              # Grab the area of the screen
        text  = pytesseract.image_to_string( image )   # OCR the image

        # IF the OCR found anything, write it to stdout.
        text = text.strip()
        if ( len( text ) > 0 ):
            print( text )

This answer was cobbled together from various other answers on SO.这个答案是从关于 SO 的各种其他答案拼凑而成的。

If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU.如果您经常将此答案用于任何事情,则值得添加一个速率限制器以节省一些 CPU。 It could probably sleep for half a second every loop.每个循环可能会休眠半秒。

Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Tesseract 是一个使用文件进行输入和输出的一次性命令行应用程序,这意味着每次 OCR 调用都会创建一个新进程并初始化一个新的 Tesseract 引擎,其中包括从磁盘读取数兆字节的数据文件。 Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine.它作为实时 OCR 引擎的适用性将取决于确切的用例——更多的像素需要更多的时间——以及提供哪些参数来调整 OCR 引擎。 Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, ie performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.最终可能需要进行一些实验来将引擎调整到准确的场景,但也预计一帧 OCR 所需的时间可能会超过帧时间,并且可能需要降低 OCR 执行的频率,即在 10-游戏可能运行在 20 FPS 而不是 60+ FPS。

In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized.根据我的经验,在老化的 CPU 上使用具有 4 个内核(默认)的英文快速模型,一个 2200x1700px 图像中相当复杂的文档可能需要 0.5 到 2 秒的时间,但是这个“复杂的文档”代表了最坏的情况和对被识别文本的结构不做任何假设。 For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:对于许多场景,例如从游戏屏幕中提取数据,可以进行一些假设以实现一些优化并加快 OCR:

  • Reduce the size of the input image.减小输入图像的大小。 When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information.从屏幕中提取特定信息时,尽可能将抓取的屏幕图像裁剪为仅该信息。 If you're trying to extract a value like health, crop the image around just the health value.如果您尝试提取健康值等值,请围绕健康值裁剪图像。
  • Use the "fast" trained models to improve speed at the cost of accuracy.使用“快速”训练的模型以牺牲准确性为代价来提高速度。 You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files.您可以使用-l选项指定不同的模型,并使用--testdata-dir选项指定包含模型文件的目录。 You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.您可以下载多个模型并将文件重命名为“eng_fast.traineddata”、“eng_best.traineddata”等。
  • Use the --psm parameter to prevent page segmentation not required for your scenario.使用--psm参数来防止您的方案不需要的页面分段。 --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best. --psm 7可能是单个信息的最佳选择,但可以使用不同的值并找出最有效的值。
  • Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890' .如果您知道将使用哪些字符(例如,如果您只查找数字),请通过更改白名单配置值来限制允许的字符集: -c tessedit_char_whitelist='1234567890'

pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...) . pytesseract是开始实现 Tesseract 的最佳方式,该库可以直接处理图像输入(尽管它在传递给 Tesseract 之前将图像保存到文件中)并使用image_to_string(...)将结果文本传回。

import pytesseract

# Capture frame...

# If the frame requires cropping:
frame = frame[y:y + h, x:x + w] 

# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")

# Process the result
health = int(text)

I know the program he uses is the open OpenCV repo for the OCR and Imagegrab for the screen capping.我知道他使用的程序是用于 OCR 的 OpenCV 存储库和用于屏幕覆盖的 Imagegrab。 The 2 processes of OCR and Cappig are done in a while true loop and I guess that its pretty fast. OCR 和 Cappig 的 2 个过程是在 while true 循环中完成的,我猜它非常快。

Michael also has a git for the project here: https://github.com/michaelreeves08/footnot-health-detection but it figures he forgot to upload it...迈克尔在这里也有一个项目的 git: https : //github.com/michaelreeves08/footnot-health-detection但它表明他忘记上传它......

I am here because I'm looking for the same thing, let me know if you ever made headway on this.我在这里是因为我正在寻找同样的东西,如果你在这方面取得了进展,请告诉我。 I got this info from reading Michael's code in screenshots from the video.我从视频截图中阅读迈克尔的代码中得到了这个信息。

Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution!好吧,我遇到了和你一样的问题,所以我做了一些研究,我相信我找到了解决方案! First, you will need these libraries:首先,您将需要这些库:

  • cv2简历2
  • pytesseract pytesseract
  • Pillow(PIL)枕头(PIL)
  • numpy麻木的

Installation:安装:

  • To install cv2, simply use this in a command line/command prompt: pip install opencv-python要安装 cv2,只需在命令行/命令提示符中使用它: pip install opencv-python

  • Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading.安装pytesseract有点困难,因为您还需要预安装Tesseract ,它是实际执行 ocr 读取的程序。 First, follow this tutorial on how to install Tesseract .首先,按照有关如何安装Tesseract 的教程进行操作。 After that, in a command line/command prompt just use the command: pip install pytesseract If you don't install this right you will get an error using the ocr之后,在命令行/命令提示符中只需使用命令: pip install pytesseract如果您没有正确安装它,您将使用 ocr 得到一个错误

  • To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow .要安装 Pillow,请在命令行/命令提示符中使用以下命令: python -m pip install --upgrade Pillowpython3 -m pip install --upgrade Pillow The one that uses python works for me使用 python 的那个对我有用

  • To install NumPy, use the following command in a command-line/command prompt: pip install numpy .要安装 NumPy,请在命令行/命令提示符中使用以下命令: pip install numpy Thought it's usually already installed in most python libraries.认为它通常已经安装在大多数 python 库中。

Code: This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had.代码:这段代码是我制作的,现在它可以按照我想要的方式工作,并且类似于 Michal 的效果。 It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read.它将占据屏幕的左上角,拍摄它的记录图像并显示当前使用 OCR 读取的图像的窗口显示。 Then in the console, it is printing out the text that it read on the screen.然后在控制台中,它打印出它在屏幕上阅读的文本。

# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract

# We only need the ImageGrab class from PIL
from PIL import ImageGrab

# Run forever unless you press Esc
while True:
    # This instance will generate an image from
    # the point of (115, 143) and (569, 283) in format of (x, y)
    cap = ImageGrab.grab(bbox=(115, 143, 569, 283))

    # For us to use cv2.imshow we need to convert the image into a numpy array
    cap_arr = np.array(cap)

    # This isn't really needed for getting the text from a window but
    # It will show the image that it is reading it from

    # cv2.imshow() shows a window display and it is using the image that we got
    # use array as input to image
    cv2.imshow("", cap_arr)

    # Read the image that was grabbed from ImageGrab.grab using    pytesseract.image_to_string
    # This is the main thing that will collect the text information from that specific area of the window
    text = pytesseract.image_to_string(cap)

    # This just removes spaces from the beginning and ends of text
    # and makes the the it reads more clean
    text = text.strip()

    # If any text was translated from the image, print it
    if len(text) > 0:
        print(text)

    # This line will break the while loop when you press Esc
    if cv2.waitKey(1) == 27:
        break

# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()

I hope this helped you with what you were looking for, it sure did help me!我希望这对您正在寻找的东西有所帮助,它确实帮助了我!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM