简体   繁体   English

Python + OpenCV:OCR 图像分割

[英]Python + OpenCV: OCR Image Segmentation

I am trying to do OCR from this toy example of Receipts.我正在尝试从这个收据的玩具示例中进行 OCR。 Using Python 2.7 and OpenCV 3.1.使用 Python 2.7 和 OpenCV 3.1。

在此处输入图片说明

Grayscale + Blur + External Edge Detection + Segmentation of each area in the Receipts (for example "Category" to see later which one is marked -in this case cash-).灰度 + 模糊 + 外部边缘检测 + 收据中每个区域的分割(例如“类别”,稍后查看标记了哪个 - 在这种情况下是现金 -)。

I find complicated when the image is "skewed" to be able to properly transform and then "automatically" segment each segment of the receipts.当图像“倾斜”以便能够正确转换然后“自动”分割收据的每个部分时,我觉得很复杂。

Example:示例:

在此处输入图片说明

Any suggestion?有什么建议吗?

The code below is an example to get until the edge detection, but when the receipt is like the first image.下面的代码是一个例子,直到边缘检测,但是当收据像第一张图像时。 My issue is not the Image to text.我的问题不是图像到文本。 Is the pre-processing of the image.是图像的预处理。

Any help more than appreciated!任何帮助都非常感谢! :) :)

import os;
os.chdir() # Put your own directory

import cv2 
import numpy as np

image = cv2.imread("Rent-Receipt.jpg", cv2.IMREAD_GRAYSCALE)

blurred = cv2.GaussianBlur(image, (5, 5), 0)

#blurred  = cv2.bilateralFilter(gray,9,75,75)

# apply Canny Edge Detection
edged = cv2.Canny(blurred, 0, 20)

#Find external contour

(_,contours, _) = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

A great tutorial on the first step you described is available at pyimagesearch (and they have great tutorials in general) pyimagesearch上提供了有关您所描述的第一步的精彩教程(他们通常有很棒的教程)

In short, as described by Ella, you would have to use cv2.CHAIN_APPROX_SIMPLE .简而言之,如 Ella 所述,您必须使用cv2.CHAIN_APPROX_SIMPLE A slightly more robust method would be to use cv2.RETR_LIST instead of cv2.RETR_EXTERNAL and then sort the areas, as it should decently work even in white backgrounds/if the page inscribes a bigger shape in the background, etc.稍微更健壮的方法是使用cv2.RETR_LIST而不是cv2.RETR_EXTERNAL然后对区域进行排序,因为即使在白色背景下它也应该可以正常工作/如果页面在背景中刻有更大的形状等。

Coming to the second part of your question, a good way to segment the characters would be to use the Maximally stable extremal region extractor available in OpenCV.来到问题的第二部分,分割字符的一个好方法是使用 OpenCV 中可用的最大稳定极值区域提取器 A complete implementation in CPP is available here in a project I was helping out in recently.在我最近参与的一个项目中,可以在此处获得 CPP 中的完整实现。 The Python implementation would go along the lines of (Code below works for OpenCV 3.0+. For the OpenCV 2.x syntax, check it up online) Python 实现将遵循(以下代码适用于 OpenCV 3.0+。对于 OpenCV 2.x 语法,请在线检查)

import cv2

img = cv2.imread('test.jpg')
mser = cv2.MSER_create()

#Resize the image so that MSER can work better
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()

regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0)) 

cv2.namedWindow('img', 0)
cv2.imshow('img', vis)
while(cv2.waitKey()!=ord('q')):
    continue
cv2.destroyAllWindows()

This gives the output as这给出了输出

在此处输入图片说明

Now, to eliminate the false positives, you can simply cycle through the points in hulls, and calculate the perimeter (sum of distance between all adjacent points in hulls[i], where hulls[i] is a list of all points in one convexHull).现在,为了消除误报,您可以简单地循环遍历 hulls 中的点,并计算周长(hulls[i] 中所有相邻点之间的距离总和,其中 hulls[i] 是一个凸包中所有点的列表)。 If the perimeter is too large, classify it as not a character.如果周长太大,则将其归类为非字符。

The diagnol lines across the image are coming because the border of the image is black.由于图像的边框为黑色,因此图像上的诊断线即将出现。 that can simply be removed by adding the following line as soon as the image is read (below line 7)可以通过在读取图像后立即添加以下行来简单地删除(第 7 行下方)

img = img[5:-5,5:-5,:]

which gives the output这给出了输出

在此处输入图片说明

The option on the top of my head requires the extractions of 4 corners of the skewed image.我头顶上的选项需要提取倾斜图像的 4 个角。 This is done by using cv2.CHAIN_APPROX_SIMPLE instead of cv2.CHAIN_APPROX_NONE when finding contours.这是通过使用做cv2.CHAIN_APPROX_SIMPLE代替cv2.CHAIN_APPROX_NONE找到轮廓时。 Afterwards, you could use cv2.approxPolyDP and hopefully remain with the 4 corners of the receipt (If all your images are like this one then there is no reason why it shouldn't work).之后,您可以使用cv2.approxPolyDP并希望保留收据的 4 个角(如果您的所有图像都像这样,那么没有理由不工作)。

Now use cv2.findHomography and cv2.wardPerspective to rectify the image according to source points which are the 4 points extracted from the skewed image and destination points that should form a rectangle, for example the full image dimensions.现在使用cv2.findHomographycv2.wardPerspective根据源点校正图像,源点是从倾斜图像中提取的 4 个点和应该形成矩形的目标点,例如完整的图像尺寸。

Here you could find code samples and more information: OpenCV-Geometric Transformations of Images在这里您可以找到代码示例和更多信息: OpenCV-Geometric Transformations of Images

Also this answer may be useful - SO - Detect and fix text skew这个答案也可能有用 - SO - Detect and fix text skew

EDIT: Corrected the second chain approx to cv2.CHAIN_APPROX_NONE .编辑:将第二个链大约更正为cv2.CHAIN_APPROX_NONE

Preprocessing the image by converting the desired text in the foreground to black while turning unwanted background to white can help to improve OCR accuracy.通过将前景中所需的文本转换为黑色同时将不需要的背景转换为白色来预处理图像有助于提高 OCR 的准确性。 In addition, removing the horizontal and vertical lines can improve results.此外,删除水平和垂直线可以改善结果。 Here's the preprocessed image after removing unwanted noise such as the horizontal/vertical lines.这是去除不需要的噪声(例如水平/垂直线)后的预处理图像。 Note the removed border and table lines注意删除的边框和表格线

在此处输入图片说明

import cv2

# Load in image, convert to grayscale, and threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find and remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (35,2))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(thresh, [c], -1, (0,0,0), 3)

# Find and remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,35))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(thresh, [c], -1, (0,0,0), 3)

# Mask out unwanted areas for result
result = cv2.bitwise_and(image,image,mask=thresh)
result[thresh==0] = (255,255,255)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

Try using Stroke Width Transform.尝试使用描边宽度变换。 Python 3 implementation of the algorithm is present here at SWTloc该算法的 Python 3 实现位于SWTloc

Install the Library安装库

pip install swtloc

Run the swttransform运行swttransform

from swtloc import SWTLocalizer
from swtloc.utils import imgshowN, imgshow

swtl = SWTLocalizer()
imgpath = ...
swtl.swttransform(imgpaths=imgpath, text_mode = 'lb_df', gs_blurr=False ,
                  minrsw = 3, maxrsw = 10, max_angledev = np.pi/3)
mgshowN([swtl.orig_img, swtl.swt_mat, swtl.swt_labelled3C],
        ['Original Image', 'Stroke Width Transform', 'Connected Components'])

在此处输入图片说明

Run the Grouping of texts运行文本分组

respacket = swtl.get_grouped(lookup_radii_multiplier=.8, sw_ratio=2,
                 cl_deviat=[13,13,13], ht_ratio=2, 
                 ar_ratio=4, ang_deviat=30)

grouped_labels = respacket[0]
grouped_bubblebbox = respacket[1]
grouped_annot_bubble = respacket[2]

imgshowN([swtl.orig_img, grouped_annot_bubble],
        ['Original', 'Grouped Bubble BBox Annotation'])

在此处输入图片说明

There are multiple parameters in the of the swttransform function and get_grouped function that you can play around with to get the desired results. swttransform函数和get_grouped函数中有多个参数,您可以使用它们来获得所需的结果。

Full Disclosure : I am the author of this library完全披露:我是这个图书馆的作者

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM