简体   繁体   English

有什么办法可以制作一组具有公差并且仍然是 O(1) 查找的浮点数?

[英]Is there any way to make a set of floats with tolerance and still O(1) lookup?

I want to make a set of floating point numbers, but with a twist:我想制作一组浮点数,但有一个转折点:

When testing if some float x is a member of the set s, I want the test to return true if s contains some float f such that当测试一些 float x 是否是集合 s 的成员时,如果 s 包含一些 float f 这样我希望测试返回 true

abs(x - f) < tol

In other words, if the set contains a number that is close to x, return true.换句话说,如果集合包含接近 x 的数字,则返回 true。 Otherwise return false.否则返回假。

One way I thought of doing this is to store numbers in a heap rather than a hash set, and use an approximate equality rule to decide whether the heap contains a close number.我想到的一种方法是将数字存储在堆中而不是 hash 集合中,并使用近似相等规则来确定堆中是否包含接近的数字。

However, that would take log(N) time, which is not bad, but it would be nice to get O(1) if such an algorithm exists.但是,这将花费 log(N) 时间,这还不错,但如果存在这样的算法,则获得 O(1) 会很好。

Does anyone have any ideas how this might be possible?有谁知道这怎么可能?

If you're not too fussy about the tolerance, then you can round each number to the closest multiple of tol/4.如果您对公差不是太挑剔,那么您可以将每个数字四舍五入到最接近的 tol/4 的倍数。

You can then use a hash map, but when you add a number x, add floor(4x/tol), floor(4x/tol+1) and floor(4x/tol-1).然后您可以使用 hash map,但是当您添加数字 x 时,添加 floor(4x/tol)、floor(4x/tol+1) 和 floor(4x/tol-1)。

When you look up a number x, look up floor(4x/tol), floor(4x/tol+1) and floor(4x/tol-1).当您查找数字 x 时,查找 floor(4x/tol)、floor(4x/tol+1) 和 floor(4x/tol-1)。

You will certainly find a match within tol/2, and you may find a match within tol.您肯定会在 tol/2 中找到匹配项,并且您可能会在 tol 中找到匹配项。

Two ideas I had (and by no means are these necessarily the best):我有两个想法(但绝不是最好的):

1.) Mask the lower N bits of the number to all 0's. 1.) 将数字的低 N 位屏蔽为全 0。 For instance, if you want the tolerance to be approx.例如,如果您希望公差约为。 1E-3, force the lower 10 bits of mantissa to 0 when adding. 1E-3,加法时强制尾数低10位为0。 Do the same when checking.检查时也是如此。

One caveat of this approach is that real computers often do weird things to LSB's of mantissas when you're not looking.这种方法的一个警告是,当你不看的时候,真正的计算机经常会对 LSB 的尾数做一些奇怪的事情。 You store x = b00111111100000000000000000000000, and when you retrieve it you get 00111111100000000000000000000001, 001111110111111111111111111111111, etc. The reasons for this are many, but the bottom line is that it's still brittle.您存储 x = b00111111100000000000000000000000,当您检索它时,您会得到 0011111110000000000000000000001、001111110111111111111111111111111 等。原因很多,但它的底线仍然很脆弱。 Anything that relies on float equality is brittle.任何依赖浮动相等性的东西都是脆弱的。

2.) Create a hashable data structure containing the different fields from an IEEE-754 float separately. 2.) 创建一个可散列的数据结构,其中分别包含来自 IEEE-754 浮点数的不同字段。 Why do that?为什么要这样做? you ask.你问。 The reason is twofold:原因有两个:

A.) By using an object that isn't a float, you prevent the runtime from treating this number as a float. A.) 通过使用不是浮点数的 object,可以防止运行时将此数字视为浮点数。 I know it shouldn't matter, but I've seen LSB's mangled so many times it obviously matters even if it's hard to say why.我知道这应该无关紧要,但我已经看到 LSB 被破坏了很多次,即使很难说出原因,它显然也很重要。 This overcomes one caveat with storing a multiple of tol/4 as a float .这克服了将 tol/4 的倍数存储为 float的一个警告。 I like the idea of rounding a number to a multiple of tol.我喜欢将数字四舍五入为 tol 的倍数的想法。 The only downside is that if you store the number as a float the runtime can still mangle LSBs, which is the original motivation for this question.唯一的缺点是,如果您将数字存储为浮点数,运行时仍然会破坏 LSB,这是这个问题的最初动机。

B.) You can store just the upper N MSB's of the mantissa - in other words, choose N such that 2**-N represents a relative tolerance you like. B.) 您可以只存储尾数的高 N MSB - 换句话说,选择 N 使得 2**-N 代表您喜欢的相对公差。

This captures the idea of rounding to multiple of tol, but also doesn't tell the runtime and/or CPU that this is a float, thereby preventing LSB mangling.这捕获了舍入到 tol 的倍数的想法,但也没有告诉运行时和/或 CPU 这是一个浮点数,从而防止 LSB 重整。

Interested to hear other ideas, critiques, etc.有兴趣听到其他想法、批评等。

Rather than another set, adjust meaning of "close".而不是另一套,调整“关闭”的含义。

Create a function that maps each finite float to an integer.创建一个 function,将每个有限float映射到 integer。

Mentally place every positive float in a list - sorted by value.在心里把每一个正float放在一个列表中——按值排序。 0.0 is at index 0 and MAX_FLOAT is at index N . 0.0 位于索引 0 处, MAX_FLOAT位于索引N处。 (Likely-wise for negatives: -MAX_FLOAT to -0.0 maps to -N to 0.total ordering (对于负数可能是明智的: -MAX_FLOAT到 -0.0 映射到-N到 0。总排序

To find if 2 float values are "close", subtract their indexes and compare to a tolerance.要查找 2 个float值是否“接近”,减去它们的索引并与公差进行比较。

This maintains the idea of float in floating-point numbers as the tolerance is a fixed integer in the index mapping domain, yet scales in the float domain.这保持了浮点数浮点数的想法,因为容差在索引映射域中是固定的 integer,但在float域中按比例缩放。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM