简体   繁体   中英

OpenCV: SIFT detection and matching methods

From the OpenCV documentation:

C++:void SIFT::operator()(InputArray img, InputArray mask, vector<KeyPoint>& keypoints,
 OutputArray descriptors, bool useProvidedKeypoints=false)

Parameters:

img – Input 8-bit grayscale image
mask – Optional input mask that marks the regions where we should detect features.
keypoints – The input/output vector of keypoints
descriptors – The output matrix of descriptors. Pass cv::noArray() 
if you do not need them.
useProvidedKeypoints – Boolean flag. If it is true, the keypoint 
detector is not run. Instead, the provided vector of keypoints is
 used and the algorithm just computes their descriptors.

I have the following questions:

  1. What values do mask takes on? I mean, if i wanted to remove the keypoints near the border of the image i should give a mask with zeros in the borders and ones in the center?

  2. On another webpage, I found a different method, which uses a method "detect" to detect the keypoints and a method "compute" to compute the descriptors. What is the difference between using the functions detect/compute versus the function "operator"? With the first method i first detect the keypoints without compute the descriptors... but, instead, if i used the method "operator" with useProvidedKeypoints flag, how do I have to compute the keypoints before?

  3. Moreover, what's the difference between the brute force matching and FLANN matching in terms of number of matched points? I need to have the same results obtained using the VL_FEAT library of MATLAB... so i want to know which of the two method is the closer

for example the following Matlab code give me a number of 2546 detected keypoints

 [f1,d1] = vl_sift(frame1_gray);

Using OpenCV:

std::vector<KeyPoint> keypoints;
cv::SiftFeatureDetector detector;
detector.detect(gray1, keypoints);
cout << keypoints.size() << endl;

just 708!!!

then, using SIFT::operator() there is something wrong in the parameters which i give as input

std::vector<KeyPoint> keypoints;
Mat descriptors;
SIFT S = SIFT();
SIFT::operator(gray1, Mat(), keypoints, descriptors);

Let's answer your questions one-by-one:

  1. Mask is an input image that you specify such that you can control where the detection of the keypoints takes place. Sometimes, you don't want to detect keypoints over the entire image, and you want to localize where you want to detect keypoints, or locate a subsection of the image to capture your keypoints. The reason why this is the case is because there may be some pre-processing done to locate salient regions in your image. For example, if you wanted to do face recognition, you only want to detect keypoints on the face, not the entire image. As such, there may be a step where you first get a general idea of where faces are in the image, then you localize the keypoint detection to those areas only.

  2. Detecting and computing are obviously two different things. Detecting is determining which pixel locations in the image are valid keypoints. Computing describes the keypoint at those particular locations. The success of interest point detectors is not only are they repeatable and robust enough to be detected, but the method to describe the keypoints is what makes them popular.

    This alludes to detectors and descriptors respectively. There are frameworks, such as SIFT and SURF that are both a detection and description framework. SIFT / SURF compute a histogram of orientations (roughly) in a 128-bin vector, and also have a detection framework that is based on the approximation of the Difference of Gaussians. If I can suggest a link, take a look at this one: Classification of detectors, extractors and matchers - They talk about all of the different detectors, descriptors, as well as methods for matching keypoints. The useProvidedKeypoints (in OpenCV: http://docs.opencv.org/2.4.1/modules/nonfree/doc/feature_detection.html#sift-operator ) flag means that you have already determined the pixel locations of where in the image you want to compute the descriptors for. As such, SIFT will bypass the detection stage of the algorithm and it will simply compute the descriptors for those pixel locations.

  3. The difference between Brute Force and FLANN (Fast Library for Approximate Nearest Neighbours - http://www.cs.ubc.ca/research/flann/ ) is in the mechanism for matching keypoints. For a given keypoint, you want to determine whether this keypoint matches any of the other keypoints that were detected in the image. One way to do this is to either search all of the key points (brute force) or a subset of keypoints (FLANN). FLANN performs a nearest neighbour searching in high-dimensional space, so that it limits where you are searching for keypoints. This will obviously be much faster than brute force, but it all depends on your application.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM