[英]How to detect all boxes for inputting letters in forms for a particular field?
It is required to recognize text from forms with boxes given for each character input.需要从带有为每个字符输入给出的框的表单中识别文本。
I have tried using bounding box for each input and cropping that particular input, ie I can get all the boxes for inputting in 'Name' field.我尝试为每个输入使用边界框并裁剪该特定输入,即我可以获得所有用于在“名称”字段中输入的框。 But when I try to detect individual boxes in the group of boxes, I am not able to do so and the opencv returns only one contour for all the boxes.
但是当我尝试检测一组框中的单个框时,我无法这样做,并且 opencv 只为所有框返回一个轮廓。 The file referred in the for loop is a file containing coordinates of the bounding box.
for 循环中引用的文件是一个包含边界框坐标的文件。 The cropped_img is the image which belongs to a single field's input(eg. Name).
cropped_img 是属于单个字段输入(例如名称)的图像。
Full form image完整形式的图像
This is the image of the form.
这是表格的图片。
cropped image for each field
每个字段的裁剪图像
It contains many boxes for inputting characters.它包含许多用于输入字符的框。 Here the number of the contours detected is always one.
这里检测到的轮廓数始终为 1。 Why am I not able to detect all individual boxes?
为什么我无法检测到所有单独的盒子? In short, I want all the individual boxes in the cropped_img.
简而言之,我想要cropped_img 中的所有单个框。
Also, any other idea for approaching the task of form ocr is really appreciated!此外,任何其他处理表单 ocr 任务的想法都非常感谢!
for line in file.read().split("\n"):
if len(line)==0:
continue
region = list(map(int,line.split(' ')[:-1]))
index=line.split(' ')[-1]
text=''
contentDict={}
#uzn in format left, up, width, height
region[2] = region[0]+region[2]
region[3] = region[1]+region[3]
region = tuple(region)
cropped_img = panimg[region[1]:region[3],region[0]:region[2]]
index=index.replace('_', ' ')
if index=='sign' or index=='picture' or index=='Dec sign':
continue
kernel = np.ones((50,50),np.uint8)
gray = cv2.cvtColor(cropped_img, cv2.COLOR_BGR2GRAY)
ret, threshold = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
threshold = cv2.bitwise_not(threshold)
dilate = cv2.dilate(threshold,kernel,iterations = 1)
ret, threshold = cv2.threshold(dilate,127,255,cv2.THRESH_BINARY)
dilate = cv2.dilate(threshold,kernel,iterations = 1)
contours, hierarchy = cv2.findContours(dilate,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
contours.sort(key=lambda x:get_contour_precedence(x, panimg.shape[1]))
print("Length of contours detected: ", len(contours))
for j, ctr in enumerate(contours):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = cropped_img[y:y+h, x:x+w]
# show ROI
cv2.imshow('segment no:'+str(j-1),roi)
cv2.waitKey(0)
The content of file 'file' is as follows:文件'file'的内容如下:
462 545 468 39 AO_Office
450 785 775 39 Last_Name
452 836 770 37 First_Name
451 885 772 39 Middle_Name
241 963 973 87 Abbreviation_Name
The expected output is contours for individual boxes for inputting a single letter for each field预期输出是单个框的轮廓,用于为每个字段输入单个字母
I know I'm a bit late to the party :) but in case somebody would be looking for solution to this problem - I recently came up with a python package that deals with this exact problem.我知道我参加聚会有点晚了 :) 但万一有人会寻找这个问题的解决方案 - 我最近想出了一个处理这个确切问题的 python 包。
I called it BoxDetect and after installing it through:我称它为BoxDetect ,安装后通过:
pip install boxdetect
You can try something like this:你可以尝试这样的事情:
from boxdetect import config
config.min_w, config.max_w = (20,50)
config.min_h, config.max_h = (20,50)
config.scaling_factors = [0.4]
config.dilation_iterations = 0
config.wh_ratio_range = (0.5, 2.0)
config.group_size_range = (1, 100)
config.horizontal_max_distance_multiplier = 2
from boxdetect.pipelines import get_boxes
image_path = "dumpster/m1nda.jpg"
rects, grouped_rects, org_image, output_image = get_boxes(image_path, config, plot=False)
import matplotlib.pyplot as plt
print("======================")
print("Individual boxes (green): ", rects)
print("======================")
print("Grouped boxes (red): ", grouped_rects)
print("======================")
plt.figure(figsize=(25,25))
plt.imshow(output_image)
plt.show()
It returns bounding rectangle coords of all the rectangle boxes, grouped boxes forming long entry fields and visualization on the form image:它返回所有矩形框的边界矩形坐标、形成长输入字段的分组框以及表单图像上的可视化:
Processing file: dumpster/m1nda.jpg
======================
Individual boxes (green): [[1153 1873 26 26]
[1125 1873 24 27]
[1098 1873 24 26]
...
[ 558 551 42 28]
[ 514 551 42 28]
[ 468 551 42 28]]
======================
Grouped boxes (red): [(468, 551, 457, 29), (424, 728, 47, 45), (608, 728, 31, 45), (698, 728, 33, 45), (864, 728, 31, 45), (1059, 728, 47, 45), (456, 792, 763, 29), (456, 842, 763, 28), (456, 891, 763, 29), (249, 969, 961, 28), (249, 1017, 962, 28), (700, 1064, 39, 32), (870, 1064, 41, 32), (376, 1124, 45, 45), (626, 1124, 29, 45), (750, 1124, 27, 45), (875, 1124, 41, 45), (1054, 1124, 28, 45), (507, 1188, 706, 29), (507, 1238, 706, 28), (507, 1287, 706, 29), (718, 1335, 36, 31), (856, 1335, 35, 31), (1008, 1335, 34, 32), (260, 1438, 51, 37), (344, 1438, 56, 37), (505, 1443, 98, 27), (371, 1530, 31, 31), (539, 1530, 31, 31), (486, 1636, 694, 28), (486, 1684, 694, 28), (486, 1731, 694, 29), (486, 1825, 694, 29), (486, 1873, 694, 28)]
======================
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.