简体   繁体   中英

What are some fast approximations of Nearest Neighbor?

Say I have a huge (a few million) list of n vectors, given a new vector, I need to find a pretty close one from the set but it doesn't need to be the closest. (Nearest Neighbor finds the closest and runs in n time)

What algorithms are there that can approximate nearest neighbor very quickly at the cost of accuracy?

EDIT: Since it will probably help, I should mention the data are pretty smooth most of the time, with a small chance of spikiness in a random dimension.

If you are using high-dimension vector, like SIFT or SURF or any descriptor used in multi-media sector, I suggest your consider LSH.

A PhD dissertation from Wei Dong ( http://www.cs.princeton.edu/cass/papers/cikm08.pdf ) might help you find the updated algorithm of KNN search, ie, LSH. Different from more traditional LSH, like E2LSH ( http://www.mit.edu/~andoni/LSH/ ) published earlier by MIT researchers, his algorithm uses multi-probing to better balance the trade-off between recall rate and cost.

There are exist faster algoritms then O(n) to search closest element by arbitary distance. Check http://en.wikipedia.org/wiki/Kd-tree for details.

For approximate nearest neighbour, the fastest way is to use locality sensitive hashing (LSH). There are many variants of LSHs. You should choose one depending on the distance metric of your data. The big-O of the query time for LSH is independent of the dataset size (not considering time for output result). So it is really fast. This LSH library implements various LSH for L2 (Euclidian) space.

Now, if the dimension of your data is less than 10, kd tree is preferred if you want exact result.

在“近邻” LSH库在网上搜索发现http://www.mit.edu/~andoni/LSH/ http://www.cs.umd.edu/~mount/ANN/ HTTP://msl.cs .uiuc.edu /〜yershova / MPNN / MPNN.htm

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM