简体繁体 English

如何将SURF兴趣点与图像数据库相匹配

[英]How to match SURF interest points to a database of images

原文 2011-11-24 09:49:53 3 2 c#/ algorithm/ tree/ computer-vision/ surf

I am using the SURF algorithm in C# (OpenSurf) to get a list of interest points from an image. 我在C＃（OpenSurf）中使用SURF算法从图像中获取兴趣点列表。 Each of these interest points contains a vector of descriptors , an x coordinate (int), an y coordinate (int), the scale (float) and the orientation (float). 这些兴趣点中的每一个都包含描述符的向量，x坐标（int），y坐标（int），缩放（浮点）和方向（浮点）。

Now, i want to compare the interest points from one image to a list of images in a database which also have a list of interest points, to find the most similar image. 现在，我想比较一个图像的兴趣点和数据库中的图像列表，这些图像列表中还有一个兴趣点列表，以找到最相似的图像。 That is: [Image(IP)] COMPARETO [List of Images(IP)]. 即：[图像（IP）] COMPARETO [图像列表（IP）]。 => Best match. =>最佳匹配。 Comparing the images on an individual basis yields unsatisfactory results. 在个体基础上比较图像产生不令人满意的结果。

When searching stackoverflow or other sites, the best solution i have found is to build an FLANN index while at the same time keeping track of where the interest points comes from. 在搜索stackoverflow或其他站点时，我发现的最佳解决方案是构建FLANN索引，同时跟踪兴趣点的来源。 But before implementation, I have some questions which puzzle me: 但在实施之前，我有一些令我困惑的问题：

1) When matching images based on their SURF interest points an algorithm I have found does the matching by comparing their distance (x1,y1->x2,y2) with each other and finding the image with the lowest total distance. 1）当基于其SURF兴趣点匹配图像时，我发现的算法通过将它们的距离（x1，y1-> x2，y2）相互比较并找到具有最低总距离的图像来进行匹配。 Are the descriptors or orientation never used when comparing interest points? 比较兴趣点时是否从未使用描述符或方向？

2) If the descriptors are used, than how do i compare them? 2）如果使用描述符，我将如何比较它们？ I can't figure out how to compare X vectors of 64 points (1 image) with Y vectors of 64 points (several images) using a indexed tree. 我无法弄清楚如何使用索引树比较64点（1图像）的X矢量和64点（几个图像）的Y矢量。

I would really appreciate some help. 我真的很感激一些帮助。 All the places I have searched or API I found, only support matching one picture to another, but not to match one picture effectively to a list of pictures. 我搜索过的所有地方或我找到的API，只支持将一张图片匹配到另一张图片，但不能将一张图片有效地匹配到图片列表。

2 个解决方案

There are multiple things here. 这里有很多东西。

In order to know two images are (almost) equal, you have to find the homographic projection of the two such that the projection results in a minimal error between the projected feature locations. 为了知道两个图像（几乎）相等，您必须找到两个图像的单应性投影，使得投影导致投影特征位置之间的最小误差。 Brute-forcing that is possible but not efficient, so a trick is to assume that similar images tend to have the feature locations in the same spot as well (give or take a bit). 蛮力可能但效率不高，因此一个技巧是假设类似的图像往往也具有相同位置的特征位置（给予或略微）。 For example, when stitching images, the image to stitch are usually taken only from a slightly different angle and/or location; 例如，当缝合图像时，通常仅从略微不同的角度和/或位置拍摄要缝合的图像; even if not, the distances will likely grow ("proportionally") to the difference in orientation. 即使不是，距离也可能会增加（“按比例”）到方向的差异。

This means that you can - as a broad phase - select candidate images by finding k pairs of points with minimum spatial distance (the k nearest neighbors) between all pairs of images and perform homography only on these points. 这意味着您可以 - 作为宽相 - 通过在所有图像对之间找到具有最小空间距离（ k最近邻居）的k对点来选择候选图像，并且仅在这些点上执行单应性。 Only then you compare the projected point-pairwise spatial distance and sort the images by said distance; 只有这样你才能比较投影点 - 成对空间距离并按照所述距离对图像进行排序; the lowest distance implies the best possible match (given the circumstances). 最低距离意味着最佳匹配（根据情况而定）。

If I'm not mistaken, the descriptors are oriented by the strongest angle in the angle histogram. 如果我没有弄错的话，描述符的方向是角度直方图中最强的角度。 Theat means you may also decide to take the euclidean (L2) distance of the 64- or 128-dimensional feature descriptors directly to obtain the actual feature-space similarity of two given features and perform homography on the best k candidates. 这意味着您还可以决定直接获取64维或128维特征描述符的欧几里德（L2）距离，以获得两个给定特征的实际特征空间相似性，并对最佳k候选者执行单应性。 (You will not compare the scale in which the descriptors were found though, because that would defeat the purpose of scale invariance.) （你会不会在描述符中发现，虽然规模比较，因为这不符合规模不变的目的。）

Both options are time consuming and direcly depend on the number of images and features; 这两个选项都很耗时，并且直接取决于图像和功能的数量; in other word's: stupid idea. 换句话说：愚蠢的想法。

Approximate Nearest Neighbors 近似的最近邻居

A neat trick is to not use actual distances at all, but approximate distances instead. 一个巧妙的技巧是根本不使用实际距离，而是使用近似距离。 In other words, you want an approximate nearest neighbor algorithm, and FLANN (although not for .NET) would be one of them. 换句话说，您需要近似最近邻居算法，而FLANN （尽管不适用于.NET）将是其中之一。

One key point here is the projection search algorithm. 这里的一个关键点是投影搜索算法。 It works like this: Assuming you want to compare the descriptors in 64-dimensional feature space. 它的工作方式如下：假设您要比较64维特征空间中的描述符。 You generate a random 64-dimensional vector and normalize it, resulting in an arbitrary unit vector in feature space; 生成随机的64维向量并对其进行归一化，从而在特征空间中生成任意单位向量; let's call it A . 我们称之为A Now (during indexing) you form the dot product of each descriptor against this vector. 现在（在索引期间），您将针对此向量形成每个描述符的点积。 This projects each 64-d vector onto A , resulting in a single, real number a_n . 这会将每个64维向量投影到A ，从而生成单个实数a_n 。 (This value a_n represents the distance of the descriptor along A in relation to A 's origin.) （此值a_n表示描述符沿A相对于A的原点的距离。）

This image I borrowed from this answer on CrossValidated regarding PCA demonstrates it visually; 我从CrossValidated的这个答案中借用的关于PCA的图像在视觉上展示了它; think about the rotation as the result of different random choices of A , where the red dots correspond to the projections (and thus, scalars a_n ). 考虑旋转是A的不同随机选择的结果，其中红点对应于投影（因此，标量a_n ）。 The red lines show the error you make by using that approach, this is what makes the search approximate. 红线显示您使用该方法所犯的错误，这是使搜索近似的原因。

You will need A again for search, so you store it. 您将再次需要A进行搜索，因此您可以存储它。 You also keep track of each projected value a_n and the descriptor it came from; 您还可以跟踪每个投影值a_n及其来自的描述符; furthermore you align each a_n (with a link to its descriptor) in a list, sorted by a_n . 此外，您将每个a_n （带有其描述符的链接）对齐到列表中， a_n排序。

To clarify using another image from here , we're interested in the location of the projected points along the axis A : 为了澄清使用此处的其他图像，我们对沿轴A的投影点的位置感兴趣：

The values a_0 .. a_3 of the 4 projected points in the image are approximately sqrt(0.5²+2²)=1.58 , sqrt(0.4²+1.1²)=1.17 , -0.84 and -0.95 , corresponding to their distance to A 's origin. 图像中4个投影点的值a_0 .. a_3约为sqrt(0.5²+2²)=1.58 ， sqrt(0.4²+1.1²)=1.17 ， -0.84和-0.95 ，对应于它们与A '的距离的起源。

If you now want to find similar images, you do the same: Project each descriptor onto A , resulting in a scalar q (query). 如果您现在想要查找类似的图像，则执行相同的操作：将每个描述符投影到A ，从而生成标量q （查询）。 Now you go to the position of q in the list and take the k surrounding entries. 现在，您将转到列表中的q位置并获取周围的k个条目。 These are your approximate nearest neighbors. 这些是你的近似最近邻居。 Now take the feature-space distance of these k values and sort by lowest distance - the top ones are your best candidates. 现在获取这些k值的特征空间距离并按最小距离排序 - 顶部距离是最佳候选者。

Coming back to the last picture, assume the topmost point is our query. 回到最后一张图片，假设最重要的一点是我们的查询。 It's projection is 1.58 and it's approximate nearest neighbor (of the four projected points) is the one at 1.17 . 它的投影是1.58 ，它的近似最近邻（四个投影点）是1.17 。 They're not really close in feature space, but given that we just compared two 64-dimensional vectors using only two values, it's not that bad either. 它们在特征空间中并不是很接近，但鉴于我们只使用两个值来比较两个64维向量，它也不是那么糟糕。

You see the limits there and, similar projections do not at all require the original values to be close, this will of course result in rather creative matches. 你看有限制，并且类似的预测并不在所有需要的原始值接近，当然结果这将在颇具创意比赛。 To accomodate for this, you simply generate more base vectors B , C , etc. - say n of them - and keep track of a separate list for each. 为了适应这种情况，您只需生成更多基本向量B ， C等 - 比如说n个 - 并跟踪每个向量的单独列表。 Take the k best matches on all of them, sort that list of k*n 64-dimensional vectors according to their euclidean distance to the query vector, perform homography on the best ones and select the one with the lowest projection error. 对所有这些匹配k最佳匹配，根据它们与查询向量的欧几里德距离对k*n 64维向量的列表进行排序，对最佳值进行单应性并选择具有最低投影误差的单应性。

The neat part about this is that if you have n (random, normalized) projection axes and want to search in 64-dimensional space, you are simply multiplying each descriptor with a nx 64 matrix, resulting in n scalars. 关于这一点的简洁部分是，如果你有n （随机的，标准化的）投影轴并想要在64维空间中搜索，你只需将每个描述符与nx 64矩阵相乘，得到n标量。

I am pretty sure that the distance is calculated between the descriptors and not their coordinates (x,y). 我很确定距离是在描述符之间而不是它们的坐标（x，y）之间计算的。 You can compare directly only one descriptor against another. 您可以直接比较一个描述符与另一个描述符。 I propose the following possible solution (surely not the optimal) 我提出以下可能的解决方案（肯定不是最优的）

You can find for each descriptor in the query image the top-k nearest neighbors in your dataset, and later take all top-k lists and finds the most common image there. 您可以在查询图像中为每个描述符找到数据集中的前k个最近邻居，然后获取所有前k个列表并在那里找到最常见的图像。