简体   繁体   中英

Saving and incrementally updating nearest-neighbor model in R

There are several nearest neighbor R packages (eg, FNN, RANN, yaImpute) but none of them seem to allow saving off the NN data structure (cover tree, KD tree etc.) so that the nearest neighbors of new queries can be calculated without reconstructing the whole tree. Are there any such functions in R?

I am looking for a function that returns a data structure that I can update incrementally as new data arrives to perform approximate K nearest neighbor search.

There is a good reason why no NN package does that.

The reason is that the "NN data structure" necessarily includes all the input data points (in the form of a KD tree ), so there is no space savings against the input data. It appears that there would be time savings in not having to re-create the KD-tree for each new input, but this is not the case, alas.

The reason is that the time to build a KD-tree is, in general, worse than linearithmic. This means that, for large inputs, it makes sense to sort the data before building the KD-tree because that will produce the KD-tree faster and it will be better balanced, which will improve the search too (it is also worse than logarithmic, in general). This approach would speed up modeling and evaluation but discourage incremental updates, of course.

Your best bet, I think, if to find a generic KD-tree package and use it instead.

The nabor package lets you build a tree and subsequently perform queries on it. But I don't think it lets you update the tree incrementally.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM