简体繁体 English

高维空间的近似最近邻 (A1NN)

[英]approximate nearest neighbor (A1NN) for high dimension spaces

原文 2016-09-26 21:26:35 8 1 algorithm/ vector/ similarity/ locality-sensitive-hash

I read this question about finding the closest neighbor for 3-dimensions points.我读了这个关于寻找 3 维点的最近邻居的问题。 Octree is a solution for this case.八叉树是这种情况的解决方案。

kd-Tree is a solution for small spaces (generally less than 50 dimensions). kd-Tree是小空间（一般小于 50 维）的解决方案。

For higher dimensions (vectors of hundreds of dimensions and millions of points) LSH is a popular solution for solving the AKNN (Aproxximate K-NN) problem, as pointed out in this question .对于更高维度（数百维和数百万点的向量），LSH 是解决 AKNN（近似 K-NN）问题的流行解决方案，如本问题所述。

However, LSH is popular for K-NN solutions, where K>>1.然而，LSH 在 K-NN 解决方案中很受欢迎，其中 K>>1。 For example, LSH has been successfully used for Content Based Image Retrieval (CBIR) applications, where each image is represented through a vector of hundreds of dimensions and the dataset is millions (or billions) of images.例如，LSH 已成功用于基于内容的图像检索 (CBIR) 应用程序，其中每张图像都通过数百个维度的向量表示，数据集是数百万（或数十亿）张图像。 In this case, K is the number of top-K most similar images wrt the query image.在这种情况下，K 是与查询图像最相似的前 K 个图像的数量。

But what if we are interested just to the most approximate similar neighbor (ie A1-NN) in high dimensional spaces?但是如果我们只对高维空间中最近似的相似邻居（即 A1-NN）感兴趣呢？ LSH is still the winner, or ad-hoc solutions have been proposed? LSH仍然是赢家，还是已经提出了临时解决方案？

1 个解决方案

You might look at http://papers.nips.cc/paper/2666-an-investigation-of-practical-approximate-nearest-neighbor-algorithms.pdf and http://research.microsoft.com/en-us/um/people/jingdw/pubs%5CTPAMI-TPTree.pdf .您可以查看http://papers.nips.cc/paper/2666-an-investigation-of-practical-approximate-nearest-neighbor-algorithms.pdf和http://research.microsoft.com/en-us/嗯/人/ jingdw/pubs%5CTPAMI-TPTree.pdf 。 Both have figures and graphs showing the perfomance of LSH vs the performance of tree-based methods which also produce only approximate answers, for different values of k including k=1.两者都有显示 LSH 的性能与基于树的方法的性能的图表，这些方法也只产生近似答案，对于不同的 k 值，包括 k=1。 The Microsoft paper claims that "It has been shown in [34] that randomized KD trees can outperform the LSH algorithm by about an order of magnitude".微软的论文声称“在 [34] 中已经表明，随机 KD 树的性能可以比 LSH 算法高出大约一个数量级”。 Table 2 P 7 of the other paper appears to show speedups over LSH which are reasonably consistent for different values of k.另一篇论文的表 2 P 7 似乎显示了 LSH 上的加速，这对于不同的 k 值是相当一致的。

Note that this is not LSH vs kd-trees.请注意，这不是 LSH 与 kd-trees。 This is LSH vs various clever tuned approximate search tree structures, where you typically search only the most promising parts of the tree, and not all of the parts of the tree that could possibly contain the closest point, and you search a number of different trees to get a decent probability of finding good points to compensate for this, tuning various parameters to get the fastest possible performance.这是 LSH 与各种巧妙调整的近似搜索树结构的对比，其中您通常只搜索树中最有希望的部分，而不是树中可能包含最近点的所有部分，然后搜索许多不同的树为了获得良好的概率来弥补这一点，调整各种参数以获得最快的性能。