简体   繁体   中英

How to extract numbers from image using OpenCV and pytesseract image_to_string()?

I'm trying to extract the numbers from an image using OpenCV and the image_to_string() method from pytesseract, but the output is not good.

图片

I tried some pre-processing methods like resize and noise filters, but still can't get accurate results. How can I handle this?

Here's a simple preprocessing step to clean up the image before using pytesseract

  • Convert image to grayscale
  • Sharpen the image
  • Perform morphological transformations to enhance text

Since your input image looks blurry, we can sharpen the image using cv2.filter2D() and a generic sharpening kernel. Other types of kernels can be found here

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(gray, -1, sharpen_kernel)

The text has small holes, so we can use cv2.dilate() to close small holes and smooth the image

sharpen = 255 - sharpen
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
dilate = cv2.dilate(sharpen, kernel, iterations=1)
result = 255 - dilate

Here's the result. You can try using just the sharpened image or the enhanced image with pytesseract

import cv2
import numpy as np

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(gray, -1, sharpen_kernel)

cv2.imwrite('sharpen.png', sharpen)
sharpen = 255 - sharpen
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
dilate = cv2.dilate(sharpen, kernel, iterations=1)

result = 255 - dilate
cv2.imwrite('result.png', result)
cv2.waitKey(0)

I tried sharpening the image; however, I didn't notice any improvement in number extraction with tesseract. My advice is to first use a deep learning-based super-resolution method to improve the image like this and use tesseract for number extraction.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM