简体   繁体   中英

Text extraction from the images

I need to extract characters from the images, but the images are varying a lot because of surrounding light effect. because of this I am not able to fix any particular pre-processing method.

图片1

图片2

my pre-processing code looks as below:

from skimage import io
import cv2
from skimage.filters import threshold_otsu, sobel
from skimage import img_as_ubyte
import numpy as np
import matplotlib.pyplot as plt
from skimage.color import rgb2gray

image = io.imread(imgg)
dim = (700, 100)   #76 pixels
resized_image = cv2.resize(image, dim, interpolation = cv2.INTER_AREA)

image = rgb2gray(resized_image)
threshold = threshold_otsu(image)
bina_image = image < threshold

img = img_as_ubyte(bina_image )
image_copy = img.copy()
kernel = np.ones((3,3), np.uint8)
clahe = cv2.createCLAHE(clipLimit=5.0, tileGridSize=(1,1))
img[:,:,0] = clahe.apply(img[:,:,0])

imghsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
imghsv[:,:,2] = [[max(pixel - 25, 0) if pixel < 190 else min(pixel + 25, 255) for pixel in row] for row in imghsv[:,:,2]]
imghsv[imghsv < 170] = 0
imghsv[imghsv > 170] = 255

I have tried ocrs like tesseract, EasyOCR and KerasOCR but none of them worked for this case. can you please suggest how can I get all the characters from these kind of images?

Text binarization represents a complicated task under varying illumination and noise. Factors such as the variance of gray levels, brightness, and backgrounds complicate the thresholding scheme. If you have enough resources, I recommend you google cloud vision

The code to implement it is easy to understand:

def detect_text_uri(uri):
"""Detects text in the file located in Google Cloud Storage or on the Web.
"""
from google.cloud import vision
client = vision.ImageAnnotatorClient()
image = vision.Image()
image.source.image_uri = uri

response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')

for text in texts:
    print('\n"{}"'.format(text.description))

    vertices = (['({},{})'.format(vertex.x, vertex.y)
                for vertex in text.bounding_poly.vertices])

    print('bounds: {}'.format(','.join(vertices)))

if response.error.message:
    raise Exception(
        '{}\nFor more info on error messages, check: '
        'https://cloud.google.com/apis/design/errors'.format(
            response.error.message))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM