简体   繁体   English

如何在 Python 中使用 OCR 提取数字或数字

[英]How to extract numbers or digits using OCR in Python

I try to extract numbers using OCR.我尝试使用 OCR 提取数字。

The development environment is run by pycharm (Python version 3).开发环境由 pycharm(Python 版本 3)运行。

My problem is how to extract numbers using OCR.我的问题是如何使用 OCR 提取数字。

The image looks like this:图像如下所示:

输入图像

In the picture above I want to get the following numeric text:在上图中,我想获得以下数字文本:

1 2   3
4 5 6 7
8 9   0

How can I get the results I want?我怎样才能得到我想要的结果?

There a range of libraries to achieve this here is an example of one from: https://pypi.org/project/pytesseract/ https://github.com/madmaze/pytesseract有一系列库可以实现这一点,这里有一个示例: https: //pypi.org/project/pytesseract/https://github.com/madmaze/pytesseract

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

# If you don't have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'
# Example tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'

# Simple image to string
print(pytesseract.image_to_string(Image.open('test.png')))

You can Otsu's threshold to obtain a binary image then extract each number.您可以通过 Otsu 的阈值获得二值图像然后提取每个数字。 After thresholding we get this阈值化后我们得到这个

在此处输入图像描述

Now we iterate through the contours and extract/save each ROI现在我们遍历轮廓并提取/保存每个 ROI

在此处输入图像描述

Now you can apply your desired OCR tool to read the text on each ROI现在您可以应用所需的 OCR 工具来读取每个 ROI 上的文本

import cv2

image = cv2.imread('1.jpg', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c, num in zip(cnts, range(len(cnts))):
    x,y,w,h = cv2.boundingRect(c)
    ROI = 255 - thresh[y:y+h, x:x+w]
    cv2.imwrite('ROI_{}.png'.format(num), ROI)

cv2.imshow('thresh', 255 - thresh)
cv2.waitKey()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM