[英]How to detect all boxes for inputting letters in forms for a particular field?
需要从带有为每个字符输入给出的框的表单中识别文本。
我尝试为每个输入使用边界框并裁剪该特定输入,即我可以获得所有用于在“名称”字段中输入的框。 但是当我尝试检测一组框中的单个框时,我无法这样做,并且 opencv 只为所有框返回一个轮廓。 for 循环中引用的文件是一个包含边界框坐标的文件。 cropped_img 是属于单个字段输入(例如名称)的图像。
完整形式的图像这是表格的图片。
每个字段的裁剪图像
它包含许多用于输入字符的框。 这里检测到的轮廓数始终为 1。 为什么我无法检测到所有单独的盒子? 简而言之,我想要cropped_img 中的所有单个框。
此外,任何其他处理表单 ocr 任务的想法都非常感谢!
for line in file.read().split("\n"):
if len(line)==0:
continue
region = list(map(int,line.split(' ')[:-1]))
index=line.split(' ')[-1]
text=''
contentDict={}
#uzn in format left, up, width, height
region[2] = region[0]+region[2]
region[3] = region[1]+region[3]
region = tuple(region)
cropped_img = panimg[region[1]:region[3],region[0]:region[2]]
index=index.replace('_', ' ')
if index=='sign' or index=='picture' or index=='Dec sign':
continue
kernel = np.ones((50,50),np.uint8)
gray = cv2.cvtColor(cropped_img, cv2.COLOR_BGR2GRAY)
ret, threshold = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
threshold = cv2.bitwise_not(threshold)
dilate = cv2.dilate(threshold,kernel,iterations = 1)
ret, threshold = cv2.threshold(dilate,127,255,cv2.THRESH_BINARY)
dilate = cv2.dilate(threshold,kernel,iterations = 1)
contours, hierarchy = cv2.findContours(dilate,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
contours.sort(key=lambda x:get_contour_precedence(x, panimg.shape[1]))
print("Length of contours detected: ", len(contours))
for j, ctr in enumerate(contours):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = cropped_img[y:y+h, x:x+w]
# show ROI
cv2.imshow('segment no:'+str(j-1),roi)
cv2.waitKey(0)
文件'file'的内容如下:
462 545 468 39 AO_Office
450 785 775 39 Last_Name
452 836 770 37 First_Name
451 885 772 39 Middle_Name
241 963 973 87 Abbreviation_Name
预期输出是单个框的轮廓,用于为每个字段输入单个字母
我知道我参加聚会有点晚了 :) 但万一有人会寻找这个问题的解决方案 - 我最近想出了一个处理这个确切问题的 python 包。
我称它为BoxDetect ,安装后通过:
pip install boxdetect
你可以尝试这样的事情:
from boxdetect import config
config.min_w, config.max_w = (20,50)
config.min_h, config.max_h = (20,50)
config.scaling_factors = [0.4]
config.dilation_iterations = 0
config.wh_ratio_range = (0.5, 2.0)
config.group_size_range = (1, 100)
config.horizontal_max_distance_multiplier = 2
from boxdetect.pipelines import get_boxes
image_path = "dumpster/m1nda.jpg"
rects, grouped_rects, org_image, output_image = get_boxes(image_path, config, plot=False)
import matplotlib.pyplot as plt
print("======================")
print("Individual boxes (green): ", rects)
print("======================")
print("Grouped boxes (red): ", grouped_rects)
print("======================")
plt.figure(figsize=(25,25))
plt.imshow(output_image)
plt.show()
它返回所有矩形框的边界矩形坐标、形成长输入字段的分组框以及表单图像上的可视化:
Processing file: dumpster/m1nda.jpg
======================
Individual boxes (green): [[1153 1873 26 26]
[1125 1873 24 27]
[1098 1873 24 26]
...
[ 558 551 42 28]
[ 514 551 42 28]
[ 468 551 42 28]]
======================
Grouped boxes (red): [(468, 551, 457, 29), (424, 728, 47, 45), (608, 728, 31, 45), (698, 728, 33, 45), (864, 728, 31, 45), (1059, 728, 47, 45), (456, 792, 763, 29), (456, 842, 763, 28), (456, 891, 763, 29), (249, 969, 961, 28), (249, 1017, 962, 28), (700, 1064, 39, 32), (870, 1064, 41, 32), (376, 1124, 45, 45), (626, 1124, 29, 45), (750, 1124, 27, 45), (875, 1124, 41, 45), (1054, 1124, 28, 45), (507, 1188, 706, 29), (507, 1238, 706, 28), (507, 1287, 706, 29), (718, 1335, 36, 31), (856, 1335, 35, 31), (1008, 1335, 34, 32), (260, 1438, 51, 37), (344, 1438, 56, 37), (505, 1443, 98, 27), (371, 1530, 31, 31), (539, 1530, 31, 31), (486, 1636, 694, 28), (486, 1684, 694, 28), (486, 1731, 694, 29), (486, 1825, 694, 29), (486, 1873, 694, 28)]
======================
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.