简体   繁体   English

使用 OCR 读取图像上的数字 python

[英]Read numbers on image using OCR python

I am trying to extract numbers on images using OpenCV in Python and tesseract.我正在尝试在 Python 和 tesseract 中使用 OpenCV 提取图像上的数字。 Here's my try but I got nothing.这是我的尝试,但我什么也没得到。 The code doesn't return the expected numbers该代码未返回预期的数字

import fitz, pytesseract, os, re
import cv2

sTemp = "Number.png"
directory = '.\MyFolder'

def useMagick(img):
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
    command = 'magick convert {} -resize 1024x640 -density 300 -quality 100 {}'.format(img, sTemp)
    os.system(command)

def readNumber(img):
    img = cv2.imread(img)
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    txt = pytesseract.image_to_string(gry)
    print(txt)
    try:
        return re.findall(r'\d+\s?\/\s?(\d+)', txt)[0]
    except:
        blur = cv2.GaussianBlur(gry, (3,3), 0)
        txt = pytesseract.image_to_string(blur)
        try:
            return re.findall(r'\d+\s?\/\s?(\d+)', txt)[0]
        except:
            return 'REVIEW'

sPath = os.path.join(directory, sTemp)
useMagick(sPath)
x = readNumber(sPath)
print(x)

Here's sample of the images这是图像示例在此处输入图像描述

The code doesn't return any digits.该代码不返回任何数字。 How can I improve the quality of such an image to be able to extract the numbers?我怎样才能提高这种图像的质量才能提取数字?

After many searches, I could finally solve the problem经过多次搜索,我终于可以解决问题

import cv2
import numpy as np
import pytesseract
import os, re

sImagesPath = r'MyFolder/'
mylist = []

def replace_chars(text):
    list_of_numbers = re.findall(r'\d+', text)
    result_number = ''.join(list_of_numbers)
    return result_number

for root, dirs, file_names in os.walk(sImagesPath):
    for file_name in file_names:
        img = cv2.imread(sImagesPath + file_name)
        gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        thr = cv2.adaptiveThreshold(gry, 181, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 13, 10)
        txt = pytesseract.image_to_string(thr, lang='eng',config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
        mylist.append(replace_chars(txt))
        print(replace_chars(txt))

with open('Output.txt', 'w') as f:
    for i in mylist:
        s = ''.join(map(str, i))
        f.write(s + '\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM