八叉树的近似最近邻算法

Question

Does anyone know the origin (paper, book, etc.) of this approximate nearest neighbor technique for octrees?: 有谁知道八叉树这种近似最近邻技术的起源（纸，书等）？

http://www.cs.ucdavis.edu/~amenta/w11/nnLecture.pdf http://www.cs.ucdavis.edu/~amenta/w11/nnLecture.pdf

I am having trouble implementing it from the provided pseudo code. 我无法从提供的伪代码中实现它。 I am also having trouble finding the original publication of the technique, which I am hoping has a little more detail. 我也很难找到该技术的原始出版物，我希望它有更多细节。

Thanks for any help. 谢谢你的帮助。

Answer 1

This is not the exact answer, but an approximate ( to use the terms of the subject :) ). 这不是确切的答案，而是一个近似值（使用主题的术语:)）。

It was too big to write it an comment, and I think is good information for a start. 太大了，无法发表评论，我认为这是一个很好的起点。

The paper mentions that Voronoi diagrams don't expand in higher dimensions than three and it implies that octrees do. 该论文提到，Voronoi图的展开维数没有超过三个，并且暗示八叉树可以展开。 That's wrong, in terms of terminology. 就术语而言，这是错误的。

An octree is defined in R^3. 在R ^ 3中定义了八叉树。 Simply put it, you see this kind of data structure in 2D, where we have a quadtree . 简单地说，您会看到2D的这种数据结构，其中有一个四叉树。 These kind of trees have 2^D children per node , where D is the dimension. 这些树的每个节点有2 ^ D个子节点 ，其中D是维。 This means: 这意味着：

 1. 2D: 2^D children per node, i.e. 4 children per node.

 2. 3D: 2^D children per node, i.e. 8 children per node.

 3. and so on.

For exampe, octree , comes from the Greek word octo, which means 8 and it implies that this tree has 8 children per node. 对于exampe， octree ，源于希腊文辛，这意味着8，它指的是此树有每个节点的8个孩子。

I had implemented this kind of tree for NN (Nearest Neighbor) and, even though I had made the tree a polymorphic one to not waste any amount of memory, this wouldn't scale above 10 dimensions. 我已经为NN （最近邻居）实现了这种树，即使我将树做成多态的，也不会浪费任何内存，但这不会扩展到10维以上。

Moreover, the paper mentions kd-trees . 此外，本文还提到了kd-trees 。 Notice, that when dimensions go high, the query time is no longer O(logn) , but it becomes slightly less than brute force approach (ie check all points). 注意，当维度变大时，查询时间不再是O(logn) ，而是比蛮力法（即检查所有点）略短。 The higher the dimensions, the worse kd-trees will perform. 尺寸越高， kd-trees性能越差。

A kd-tree is actually a binary tree, embedded in geometry. kd-tree实际上是嵌入几何中的二叉树。 By that, I mean, that every node has two children and at every level, we halve the dataset (usually in the median of the coordinate with the greatest variance, so that we can exploit the structure of the dataset). 我的意思是，每个节点都有两个孩子，并且在每个级别上，我们将数据集减半（通常在方差最大的坐标中间，以便我们可以利用数据集的结构）。 And this will result into a perfect tree. 这将形成一棵完美的树。

在此处输入图片说明 Here you can see a kd-tree , a friend of mine made, of 64 points in 8D. 在这里，您可以看到一个kd-tree ，它是我的一个朋友，在8D中得到64点。 In this version, we store 4 points per leaf. 在此版本中，我们每个叶子存储4点。

The numbers in the boxes refer to the point number (starting with 1, ie line numbers in test.points file). 框中的数字表示点号（从1开始，即test.points文件中的行号）。

The notation "8 @ 0.532" refers to an inner node, where the data is split at 0.532 along the eighth dimension (again, dimensions starting with 1, for easier human understanding). 符号“ 8 @ 0.532”是指一个内部节点，其中的数据沿第八维（同样，维从1开始，以便于人类理解）在0.532处拆分。

That's why, we tend our interest in approximate NN , which means that we pay some loss in accuracy, but we obtain some speedup. 因此，我们倾向于关注近似NN ，这意味着我们付出了一些准确性上的损失，但是却获得了一定的提速。 (As you may know, everything is a trade-off). （您可能知道，一切都是折衷的）。

By Box , it probably means a minimum bounding box . 按Box ，它可能意味着一个minimum bounding box 。

This is simple and here is an example: 这很简单，这是一个示例：

Suppose you have, in 2D, this dataset: 假设您具有2D数据集：

-1 -2
 0  5
 8 -5

In order to construct the Bounding box, we need to find the minimum and the maximum coordinate in every dimension. 为了构造边界框，我们需要找到每个维度上的最小和最大坐标。 Note, that for storing the Boudning box, it is enough to store its min and max corner. 请注意，对于存储Boudning框，足以存储其最小和最大角。

Here, we have min = (-1, -5) and max = (8, 5) . 在这里，我们有min = (-1, -5) and max = (8, 5) 。 The bounding box is then, the rectangle, formed, in clockwise order -starting from max corner, the one that has as corners: 这样，边界框将按照顺时针方向形成，即从最大角开始，即具有角的矩形：

( 8,  5)  // ( max.x, max.y)
( 8, -5)  // ( max.x, min.y)
(-1, -5)  // ( min.x, min.y)
(-1,  5)  // ( min.x, max.y)

Observe, that all the points of the dataset, lie inside this bounding box. 请注意，数据集的所有点都位于此边界框内。

As for the paper, it's actually a lecture, not a paper. 至于论文，实际上是演讲，而不是论文。 It doesn't explain how one should write the algorithm. 它没有解释如何编写算法。 Moreover, it doesn't provide any unique information, in order to try to find another .pdf, that explains in more details the .pdf in your link. 而且，它没有提供任何独特的信息来尝试查找另一个.pdf，该信息在链接中更详细地说明了.pdf。

[EDIT] for the OP's comment. [编辑] OP的评论。

1) Q: dequeue box B, containing representative point p 1）Q： dequeue box B, containing representative point p

I would say, that dequeue, means extract the "first" element of the queue. 我要说的是，出队意味着提取队列的“第一个”元素。 Enqueue, means push back an element in the queue. 入队，意味着将队列中的元素推回。 The queue seams to hold Bounding boxes as elements. 队列缝将边界框作为元素。

2) Q: r = d(q,B) 2）Q： r = d(q,B)

Maybe, he means from the representative point the box contains. 也许，他的意思是从盒子的角度说。 Not clear. 不清楚。

You can compute the (Euclidean) distance from the query point to the closest corner of the box, or to the representative of the box. 您可以计算从查询点到框的最近角或框的代表的（欧几里得）距离。

3) for all children B' of B containing points in P 3） for all children B' of B containing points in P

P is the dataset. P是数据集。 Every box, is partitioned in 8 sub-boxes at every level (in the case of octree). 每个框都在每个级别上划分为8个子框（对于octree）。

4) Q: while dN >= (1+e)r do 4）Q： while dN >= (1+e)r do

Approximation error e , is actually, what we call epsilon . 近似误差e实际上是我们所谓的epsilon 。 It is usually a parameter and it means, that when you check: 它通常是一个参数，表示在检查时：

while delta >= r do

you are less strict and you do 你不那么严格，你就做

while delta >= (1 + e)*r do

which means that you are going into the loop less times than the exact condition above. 这意味着您进入循环的时间少于上述确切条件的时间。 So, I think it says, to insert every sub-box of box B, in the queue. 因此，我认为这是说要在队列中插入框B的每个子框。 This is not so clever, IMHO. 恕我直言，这不是那么聪明。

About the last comment with e = 0.01, just do the math in the condition above. 关于e = 0.01的最后一条注释，只需在上述条件下进行数学运算即可。 You will see that the answer is no, since as the link you posted state, e is a multiplicative factor. 您将看到答案是否定的，因为作为发布链接的状态， e是一个乘法因子。

八叉树的近似最近邻算法

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-05-07 13:02:21

八叉树的近似最近邻算法

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-05-07 13:02:21

解决方案1
3 已采纳 2014-05-07 13:02:21