简体   繁体   English

二维 KD 树和最近邻搜索

[英]2D KD Tree and Nearest Neighbour Search

I'm currently implementing a KD Tree and nearest neighbour search, following the algorithm described here: http://ldots.org/kdtree/我目前正在按照此处描述的算法实施 KD 树和最近邻搜索: http : //ldots.org/kdtree/

I have come across a couple of different ways to implement a KD Tree, one in which points are stored in internal nodes, and one in which they are only stored in leaf nodes.我遇到过几种不同的实现 KD 树的方法,一种是将点存储在内部节点中,另一种是仅将它们存储在叶节点中。 As I have a very simple use case (all I need to do is construct the tree once, it does not need to be modified), I went for the leaf-only approach is it seemed to be simpler to implement.由于我有一个非常简单的用例(我需要做的就是构建一次树,不需要修改),我选择了仅叶方法,因为它似乎更容易实现。 I have successfully implemented everything, the tree is always constructed successfully and in most cases the nearest neighbour search returns the correct value.我已经成功地实现了一切,树总是成功构建,在大多数情况下,最近邻搜索返回正确的值。 However, I have some issues that with some data sets and search points, the algorithm returns an incorrect value.但是,我有一些问题,即对于某些数据集和搜索点,算法会返回不正确的值。 Consider the points:考虑以下几点:

[[6, 1], [5, 5], [9, 6], [3, 81], [4, 9], [4, 0], [7, 9], [2, 9], [6, 74]]

Which constructs a tree looking something like this (excuse my bad diagramming):它构建了一个看起来像这样的树(请原谅我糟糕的图表):一棵 KD 树

Where the square leaf nodes are those that contain the points, and the circular nodes contain the median value for splitting the list at that depth.其中方形叶节点是那些包含点的节点,圆形节点包含在该深度拆分列表的中值。 When calling my nearest neighbour search on this data set, and looking for the nearest neighbour to [6, 74] , the algorithm returns [7, 9] .在此数据集上调用我的最近邻搜索并查找[6, 74]的最近邻时,算法返回[7, 9] Although this follows the algorithm correctly, it is not in fact the closest point to [6, 74] .尽管这正确地遵循了算法,但它实际上并不是最接近[6, 74] The closest point would actually be [3, 81] which is at a distance of 7.6, [7, 9] is at a distance of 65.最近的点实际上是[3, 81] ,距离 7.6, [7, 9]距离 65。

Here are the points plotted, for visualization, the red point being the one I am attempting to find the nearest neighbour for:以下是绘制的点,用于可视化,红点是我试图找到最近邻居的点:

在此处输入图片说明

If it helps, my search method is as follows:如果有帮助,我的搜索方法如下:

 private LeafNode search(int depth, Point point, KDNode node) {
        if(node instanceof LeafNode)
            return (LeafNode)node;
        else {
            MedianNode medianNode = (MedianNode) node;

            double meanValue = medianNode.getValue();
            double comparisonValue = 0;
            if(valueEven(depth)) {
                comparisonValue = point.getX();
            }
            else {
                comparisonValue = point.getY();
            }

            KDNode nextNode;
            if(comparisonValue < meanValue) {
                if (node.getLeft() != null)
                    nextNode = node.getLeft();
                else
                    nextNode = node.getRight();
            }
            else {
                if (node.getRight() != null)
                    nextNode = node.getRight();
                else
                    nextNode = node.getLeft();
            }

            return search(depth + 1, point, nextNode);
        }
    }

So my questions are:所以我的问题是:

  1. Is this what to expect from nearest neighbour search in a KD Tree, or should I be getting the closest point to the point I am searching for (as this is my only reason for using the tree)?这是对 KD 树中最近邻搜索的期望,还是我应该获得最接近我正在搜索的点的点(因为这是我使用该树的唯一原因)?

  2. Is this an issue only with this form of KD Tree, should I change it to store points in inner nodes to solve this?这是否仅适用于这种形式的 KD 树,我是否应该将其更改为将点存储在内部节点中以解决此问题?

A correct implementation of a KD-tree always finds the closest point(it doesn't matter if points are stored in leaves only or not). KD 树的正确实现总是会找到最近的点(点是否仅存储在叶子中并不重要)。 Your search method is not correct, though.但是,您的搜索方法不正确。 Here is how it should look like:它应该是这样的:

bestDistance = INF

def getClosest(node, point)
    if node is null
        return
    // I will assume that this node splits points 
    // by their x coordinate for the sake of brevity.
    if node is a leaf
        // updateAnswer updates bestDistance value
        // and keeps track of the closest point to the given one.
        updateAnswer(node.point, point)
    else
        middleX = node.median
        if point.x < middleX
            getClosest(node.left, point)
            if node.right.minX - point.x < bestDistance
                getClosest(node.right, point)
        else
            getClosest(node.right, point)
            if point.x - node.left.maxX < bestDistance
                getClosest(node.left, point)

The explanation given on ldots.org is just plain wrong (along with many other top Google results on searching KD Trees). ldots.org 上给出的解释完全是错误的(以及搜索 KD 树的许多其他顶级 Google 结果)。

See https://stackoverflow.com/a/37107030/591720 for a correct implementation.有关正确实施,请参阅https://stackoverflow.com/a/37107030/591720

Not sure if this answer would be still relevant, but anyway I dare to suggest the following kd-tree implementation: https://github.com/stanislav-antonov/kdtree不确定这个答案是否仍然相关,但无论如何我敢于建议以下 kd-tree 实现: https : //github.com/stanislav-antonov/kdtree

The implementation is simple enough and could be useful in a case if one decided to sort out how the things work in practice.实现非常简单,如果人们决定理清事情在实践中的工作方式,它可能会很有用。

Regarding the way how the tree is built an iterative approach is used, thus its size is limited by a memory and not a stack size.关于树的构建方式,使用迭代方法,因此其大小受内存而不是堆栈大小的限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM