简体   繁体   English

将HOG + SVM培训应用于网络摄像头以进行对象检测

[英]Apply HOG+SVM Training to Webcam for Object Detection

I have trained my SVM classifier by extracting HOG features from a positive and negative dataset 我已经通过从正负数据集中提取HOG特征来训练我的SVM分类器

from sklearn.svm import SVC
import cv2
import numpy as np

hog = cv2.HOGDescriptor()


def hoggify(x,z):

    data=[]

    for i in range(1,int(z)):
        image = cv2.imread("/Users/munirmalik/cvprojek/cod/"+x+"/"+"file"+str(i)+".jpg", 0)
        dim = 128
        img = cv2.resize(image, (dim,dim), interpolation = cv2.INTER_AREA)
        img = hog.compute(img)
        img = np.squeeze(img)
        data.append(img)

    return data

def svmClassify(features,labels):
    clf=SVC(C=10000,kernel="linear",gamma=0.000001)
    clf.fit(features,labels)

    return clf

def list_to_matrix(lst):
    return np.stack(lst) 

I want to apply that training so that the program will be able to detect my custom object (chairs). 我想应用该培训,以便程序能够检测到我的自定义对象(椅子)。

I have added labels to each set already; 我已经为每个集合添加了标签; what needs to be done next? 接下来需要做什么?

You already have three of the most important pieces available at your disposal. 您已经可以使用其中三个最重要的部分。 hoggify creates a list of HOG descriptors - one for each image. hoggify创建一个HOG描述符列表-每个图像一个。 Note that the expected input for computing the descriptor is a grayscale image and the descriptor is returned as a 2D array with 1 column which means that each element in the HOG descriptor has its own row. 注意,用于计算描述符的预期输入是灰度图像,并且该描述符作为具有1列的2D数组返回,这意味着HOG描述符中的每个元素都有其自己的行。 However, you are using np.squeeze to remove the singleton column and replacing it with a 1D numpy array instead, so we're fine here. 但是,您正在使用np.squeeze删除单例列,而是将其替换为一维numpy数组,所以我们在这里很好。 You would then use list_to_matrix to convert the list into a numpy array. 然后,您将使用list_to_matrix将列表转换为numpy数组。 Once you do this, you can use svmClassify to finally train your data. 完成此操作后,您可以使用svmClassify最终训练您的数据。 This assumes that you already have your labels in a 1D numpy array. 假设您已经将labels放置在1D numpy数组中。 After you train your SVM, you would use the SVC.predict method where given input HOG features, it would classify whether the image belonged to a chair or not. 训练SVC.predict SVM之后,可以使用SVC.predict方法,其中在给定输入HOG功能的情况下,它将对图像是否属于椅子进行分类。

Therefore, the steps you need to do are: 因此,您需要执行的步骤是:

  1. Use hoggify to create your list of HOG descriptors, one per image. 使用hoggify创建您的HOG描述符列表,每个图像一个。 It looks like the input x is a prefix to whatever you called your chair images as, while z denotes the total number of images you want to load in. Remember that range is exclusive of the ending value, so you may want to add a + 1 after int(z) (ie int(z) + 1 ) to ensure that you include the end. 看起来输入x是您称呼椅子图像的前缀,而z表示要加载的图像总数。请记住,该range不包括结束值,因此您可能要添加+ 1 int(z)之后的+ 1 (即int(z) + 1 ),以确保您包括结尾。 I'm not sure if this is the case, but I wanted to throw it out there. 我不确定是否是这种情况,但我想把它扔在那里。

     x = '...' # Whatever prefix you called your chairs z = 100 # Load in 100 images for example lst = hoggify(x, z) 
  2. Convert the list of HOG descriptors into an actual matrix: 将HOG描述符列表转换为实际矩阵:

     data = list_to_matrix(lst) 
  3. Train your SVM classifier. 训练您的SVM分类器。 Assuming you already have your labels stored in labels where a value 0 denotes not a chair and 1 denotes a chair and it is a 1D numpy array: 假设您已经将标签存储在labels ,其中值0表示不是椅子, 1表示椅子,并且它是一维numpy数组:

     labels = ... # Define labels here as a numpy array clf = svmClassify(data, labels) 
  4. Use your SVM classifer to perform predictions. 使用SVM分类器执行预测。 Assuming you have a test image you want to test with your classifier, you will need to do the same processing steps like you did with your training data. 假设您有要使用分类器进行测试的测试图像,则需要执行与处理训练数据相同的处理步骤。 I'm assuming that's what hoggify does where you can specify a different x to denote different sets to use. 我假设这是hoggify所做的,您可以在其中指定不同的x来表示要使用的不同集合。 Specify a new variable xtest to specify this different directory or prefix, as well as the number of images you need, then use hoggify combined with list_to_matrix to get your features: 指定一个新变量xtest来指定此不同的目录或前缀,以及所需的图像数量,然后将hoggifylist_to_matrix结合使用以获取功能:

     xtest = '...' # Define new test prefix here ztest = 50 # 50 test images lst_test = hoggify(xtest, ztest) test_data = list_to_matrix(lst_test) pred = clf.predict(test_data) 

    pred will contain an array of predicted labels, one for each test image that you have. pred将包含一组预测标签,每个标签对应一个测试图像。 If you want, you can see how well your SVM did with the training data, so since you have this already at your disposal, just use data again from step #2: 如果你愿意,你可以看到你的SVM与训练数据有多好做,所以你既然有这个已经在您的处置,只是用data再次从第2步:

     pred_training = clf.predict(data) 

    pred_training will contain an array of predicted labels, one for each training image. pred_training将包含一组预测标签,每个训练图像一个。


If you ultimately want to use this with a webcam, the process would be to use a VideoCapture object and specify the ID of the device that is connected to your computer. 如果您最终希望将其用于网络摄像头,则过程将是使用VideoCapture对象并指定连接到计算机的设备的ID。 Usually there's only one webcam connected to your computer, so use the ID of 0. Once you do this, the process would be to use a loop, grab a frame, convert it to grayscale as HOG descriptors require a grayscale image, compute the descriptor, then classify the image. 通常,只有一台网络摄像头连接到您的计算机,因此使用ID为0。完成此操作后,过程将是使用循环,抓取帧并将其转换为灰度,因为HOG描述符需要灰度图像,然后计算该描述符,然后对图像进行分类。

Something like this would work, assuming that you've already trained your model and you've created a HOG descriptor object from before: 假设您已经训练了模型并且已经从以前创建了HOG描述符对象,则类似的事情将起作用:

cap = cv2.VideoCapture(0)
dim = 128 # For HOG

while True:
    # Capture the frame
    ret, frame = cap.read()

    # Show the image on the screen
    cv2.imshow('Webcam', frame)

    # Convert the image to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Convert the image into a HOG descriptor
    gray = cv2.resize(gray, (dim, dim), interpolation = cv2.INTER_AREA)
    features = hog.compute(gray)
    features = features.T # Transpose so that the feature is in a single row

    # Predict the label
    pred = clf.predict(features)

    # Show the label on the screen
    print("The label of the image is: " + str(pred))

    # Pause for 25 ms and keep going until you push q on the keyboard
    if cv2.waitKey(25) == ord('q'):
        break

cap.release() # Release the camera resource
cv2.destroyAllWindows() # Close the image window

The above process reads in an image, displays it on the screen, converts the image into grayscale so we can compute its HOG descriptor, ensures that the data is in a single row compatible for the SVM you trained and we then predict its label. 上面的过程读取图像,将其显示在屏幕上,将图像转换为灰度,以便我们可以计算其HOG描述符,确保数据在一行中与您训练的SVM兼容,然后预测其标签。 We print this to the screen, and we wait for 25 ms before we read in the next frame so we don't overload your CPU. 我们将其打印到屏幕上,并等待25 ms,然后再读下一帧,这样才不会使您的CPU过载。 Also, you can quit the program at any time by pushing the q key on your keyboard. 另外,您随时可以通过按键盘上的q键来退出程序。 Otherwise, this program will loop forever. 否则,该程序将永远循环。 Once we finish, we release the camera resource back to the computer so that it can be made available for other processes. 完成后,我们会将相机资源释放回计算机,以便可以将其用于其他进程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM