[英]Digit recognition with openCV and python
我正在尝试为openCV中的视频捕获实现数字识别程序。 它可以使用普通(静止)图片作为输入,但是当我添加视频捕获功能时,如果我移动相机,它在录制时会卡住。 我的程序代码在这里:
import numpy as np
import cv2
from sklearn.externals import joblib
from skimage.feature import hog
# Load the classifier
clf = joblib.load("digits_cls.pkl")
# Default camera has index 0 and externally(USB) connected cameras have
# indexes ranging from 1 to 3
cap = cv2.VideoCapture(0)
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
# Convert to grayscale and apply Gaussian filtering
im_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
# Threshold the image
ret, im_th = cv2.threshold(im_gray.copy(), 120, 255, cv2.THRESH_BINARY_INV)
# Find contours in the binary image 'im_th'
_, contours0, hierarchy = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw contours in the original image 'im' with contours0 as input
# cv2.drawContours(frame, contours0, -1, (0,0,255), 2, cv2.LINE_AA, hierarchy, abs(-1))
# Rectangular bounding box around each number/contour
rects = [cv2.boundingRect(ctr) for ctr in contours0]
# Draw the bounding box around the numbers
for rect in rects:
cv2.rectangle(frame, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 255, 0), 3)
# Make the rectangular region around the digit
leng = int(rect[3] * 1.6)
pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
# Resize the image
roi = cv2.resize(roi, (28, 28), im_th, interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
# Calculate the HOG features
roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
cv2.putText(frame, str(int(nbr[0])), (rect[0], rect[1]),cv2.FONT_HERSHEY_DUPLEX, 2, (0, 255, 255), 3)
# Display the resulting frame
cv2.imshow('frame', frame)
cv2.imshow('Threshold', im_th)
# Press 'q' to exit the video stream
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
我得到的错误是,在调整大小的ROI(感兴趣区域)上没有输入。 我觉得这很奇怪,因为只要我在图片中不要过分移动,它就可以工作。 由于我尝试了很多不同的相机,因此我确定不是相机有问题。 这是特定的错误消息:
Traceback (most recent call last):
File "C:\Users\marti\Desktop\Code\Python\digitRecognition\Video_cap.py", line 55, in <module>
roi = cv2.resize(roi, (28, 28), im_th, interpolation=cv2.INTER_AREA)
cv2.error: D:\Build\OpenCV\opencv-3.2.0\modules\imgproc\src\imgwarp.cpp:3492: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize
在尝试寻找轮廓之前,对预处理使用了固定的阈值。 由于cv2.resize()
必须调整某些内容的大小,因此它期望roi矩阵的宽度和高度不为零。 我猜想在移动相机时的某个时候,由于不自适应的预处理算法,您不会检测到任何数字。
建议您在移动相机时显示阈值图像和轮廓叠加在框架上的图像。 这样,您将能够调试算法。 另外,请确保print(len(rects))
以查看是否检测到任何矩形。
另一个技巧是保存帧并在崩溃前保存的最后一个帧上运行算法,以找出导致该错误的原因。
总而言之,如果您希望代码产生有意义的结果,则确实需要控制您的代码。 解决方案-取决于您的数据-可能在阈值操作之前使用某种对比度增强和/或使用Otsu的“方法”或“ 自适应阈值”以及一些其他过滤。
尝试一下:
if roi.any():
roi = cv2.resize(roi, (28, 28), frame, interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
我认为这可以满足您的要求(在示例中,我简化了您的操作):
cap = cv2.VideoCapture(0)
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
frame2=frame.copy()
# Convert to grayscale and apply Gaussian filtering
im_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
ret, im_th = cv2.threshold(im_gray.copy(), 120, 255, cv2.THRESH_BINARY_INV)
# Find contours in the binary image 'im_th'
_, contours0, hierarchy = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Rectangular bounding box around each number/contour
rects = [cv2.boundingRect(ctr) for ctr in contours0]
# Draw the bounding box around the numbers
for rect in rects:
cv2.rectangle(frame, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 0, 255), 3)
# Make the rectangular region around the digit
leng = int(rect[3] * 1.6)
pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
# Resize the image
if roi.any():
roi = cv2.resize(roi, (28, 28), frame, interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
# Display the resulting frame
cv2.imshow('frame', frame)
#cv2.imshow('Threshold', im_th)
# Press 'q' to exit the video stream
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.