R *树重叠计算

Question

I was reading through this implementation of the R* Tree , and I noticed that they are calculating overlap differently from how the paper defines it. 我正在阅读R * Tree的此实现，并且发现它们计算重叠的方式与本文定义重叠的方式不同。

In the paper, overlap is defined as such: 在本文中，重叠定义如下：

For a given node/rect k , compute the sum of area of the intersection between k and each sibling of k (not including k ). 对于给定的节点/ RECT K，计算k和k的每个同级（不包括K）之间的相交处的面积的总和。

Overlap enlargement is then the delta of this value and what the overlap of the node k is if an item r is added to k . 那么，重叠放大就是该值的增量，如果将项目r添加到k ，则节点k的重叠就是什么。

Something like this: 像这样：

childOverlapEnlargement(Node child, item r)
{
    childEnlarged = child.union(r);
    sum = 0;
    for(each sibling s of child which isn't node)
    {
        sum += area(childEnlarged.intersect(s)) - area(child.intersect(s));
    }
    return sum;
}

In the other implementation, they sort by the intersection area of a given node with the item being inserted. 在另一个实现中，它们按给定节点与要插入的项的交集区域进行排序。 Something like this: 像这样：

childOverlapEnlargement(Node node, item r)
{
    return area(node.intersect(r));
}

Obviously their implementation is computationally less intensive than the paper's definition. 显然，它们的实现在计算上比本文的定义要少。 However, I can't find any obvious logic why the two computations should be equal. 但是，我找不到任何显而易见的逻辑来解释为什么两个计算应该相等。

So my questions are: 所以我的问题是：

Do the two computations always end up with the same subtrees being picked? 两种计算是否总是以选择相同的子树结束？ Why? 为什么？
If they do result in different subtrees being picked, are the results better or close to as good as the paper's definition? 如果确实导致选择了不同的子树，结果是更好还是接近论文的定义？ Or was the choice made in error? 还是选择错误？

edit: re-read over their implementation and I realized they weren't comparing the intersection of two siblings, but the intersection of each potential leaf and the item being inserted. 编辑：重新阅读它们的实现，我意识到他们并没有比较两个同级的交集，而是每个潜在叶子与插入的项的交集。 Strangely enough, they're picking the sibling which overlaps the least with the item being inserted. 奇怪的是，他们选择的兄弟姐妹与插入的项目重叠最少。 Wouldn't you want to insert into the node which overlaps the most with the item being inserted? 您是否要插入与要插入的项目重叠最多的节点中？

Answer 1

Maybe the implementation you are looking at has bugs or is incorrect. 您正在查看的实现可能存在错误或不正确。 Nobody is perfect. 没有人是完美的。

Note that the R*-tree tries to minimize overlap enlargement , not overlap itself. 请注意，R *树尝试最小化重叠扩大 ，而不是重叠自身。

Some overlap will likely be unavoidable. 某些重叠可能是不可避免的。 If there already is overlap, you cannot expect this to decreate when inserting additional rectangles. 如果已经存在重叠，则不能期望在插入其他矩形时取消重叠。 But you can try to at least not increase the amount of overlap. 但是您可以尝试至少不增加重叠量。

As for performance considerations, check whether you need to actually compute the intersection rectangles. 考虑到性能，请检查是否需要实际计算相交矩形。 Try to instead of computing area(intersection()) to do a function intersectionSize() . 尝试代替计算area(intersection())来做一个函数intersectionSize() 。 This does make a difference. 这确实有所作为。 For example, if A.maxX = 1 and B.minX = 2 I can immediately give the intersection size of 0, without looking at any of the other dimensions. 例如，如果A.maxX = 1和B.minX = 2我可以立即给出交集大小0，而无需考虑其他任何尺寸。

Avoid eagerly precomputing all intersections etc. that you could need. 避免急切地预先计算可能需要的所有交集等。 Instead, compute only those that you actually need. 相反，仅计算您实际需要的那些。 Profile your code, and see if you can optimize the critical codepaths. 分析您的代码，并查看是否可以优化关键代码路径。 There usually are some low hanging fruit there. 通常那里有一些低垂的水果。

R *树重叠计算

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-01-16 07:35:29

R *树重叠计算

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-01-16 07:35:29

解决方案1
1 已采纳 2013-01-16 07:35:29