简体   繁体   English

浮点数的紧凑格式

[英]Compact format for floating-point numbers

There are special formats (base-128) designed for transmitting integers used in protobufs and elsewhere . 有一些特殊格式(base-128)设计用于传输protobufs其他地方使用的整数。 They're advantageous when most the integers are small (they need a single byte for smallest numbers and may waste one byte for others). 当大多数整数较小时(它们需要最小大小的单个字节,而其他整数可能浪费一个字节),它们是有利的。

I wonder if there's something similar for floating point numbers under the assumption that most of them are actually small integers? 我想知道在大多数浮点数实际上都是小整数的情况下,是否存在类似的浮点数?


To address the answer by Alice: I was thinking about something like 为了解决爱丽丝的答案:我在考虑类似

void putCompressedDouble(double x) {
    int n = (int) x;
    boolean fits = (n == x);
    putBoolean(fits);
    if (fits) {
        putCompressedInt(n);
    } else {
        putUncompressedLong(Double.doubleToLongBits(x));
    }
}

This works (except for the negative zero, which I really don't care about), but it's wasteful in case of fits == true . 这是可行的(负零除外,我真的不在乎),但是在fits == true情况下fits == true浪费。

It depends on the distribution of your numbers. 这取决于您的号码分布。 Magnitude doesn't really matter that much, since its expressed through the exponent field of a float. 幅度并不重要,因为它通过浮点数的指数字段表示。 Its usually the mantissa that contributes the most "weight" in terms of storage. 通常,尾数在存储方面贡献最大的“重量”。

If your floats are mainly integers, you may gain something by converting to int (via Float.floatToIntBits()), and checking how many trailing zeros there are (for small int values there should be up to 23 trailing zeros). 如果您的浮点数主要是整数,则可以通过转换为int(通过Float.floatToIntBits())并检查有多少尾随零来获得一些收益(对于较小的int值,最多应有23个尾随零)。 When using a simple scheme to encode small int's, you may implement encoding floats simply as: 当使用简单的方案对小整数进行编码时,您可以将浮动编码简单地实现为:

int raw = Float.floatToIntBits(f);
raw = Integer.reverse(raw);
encodeAsInt(raw);

(Decoding is simply reversing the process). (解码只是逆向过程)。 What this does is simply move the trailing zeros in the mantissa to the most significant bits of the int representation, which is friendly to encoding schemes devised for small integers. 这样做只是将尾数中的尾随零移动到int表示形式的最高有效位,这对为小整数设计的编码方案很友好。

Same can be applied to double<->long. 同样可以应用于double--long。

Probably not, and this is almost certainly not something you want. 可能不是,这几乎肯定不是您想要的。

As noted at this stack overflow post , floating point numbers are not stored in a platform independent way in protocol buffers; 该堆栈溢出文章所述 ,浮点数未以与平台无关的方式存储在协议缓冲区中。 they are essentially bit for bit representations that are then cast using a union. 它们本质上是逐位表示的,然后使用联合进行转换。 This means float will take 4 bytes and double 8 bytes. 这意味着float将占用4个字节,再加上8个字节。 This is almost certainly what you want . 这几乎可以肯定是您想要的

Why? 为什么? Floating points are not integers . 浮点数不是整数 The integers are a well formed group; 整数是一个结构良好的组; each number is valid, every bit pattern represents a number, and they exactly represent the integer they are. 每个数字都是有效的,每个位模式表示一个数字,并且它们恰好表示它们是整数。 Floating points cannot represent many important numbers exactly: most floats can't represent 0.1 exactly, for example. 浮点数不能精确表示许多重要数字:例如,大多数浮点数不能精确表示0.1。 The problem of infinities, NAN's, etc etc, all make a compressed format a non-trivial task. 无限性,NAN等的问题都使压缩格式成为一项艰巨的任务。

If you have small integers in a float, then convert them to small integers or some fixed point precision format . 如果浮点数中有小整数, 则将它们转换为小整数或某些定点精度格式 For example, if you know you only have....4 sigfigs, you can convert from floating point to a fixed point short, saving 2 bytes. 例如,如果您知道只有.... 4个sigfig,则可以将浮点数转换为短定点数,从而节省了2个字节。 Just make sure each end knows how to deal with this type, and you'll be golden. 只要确保每一端都知道如何处理这种类型,就可以了。

But any operation that google could do to try and save space in this instance would be both reinventing the wheel and potentially dangerous. 但是在这种情况下,谷歌可以做的任何尝试来节省空间的操作都将重新发明轮子,并且有潜在的危险。 Which is probably why they try not to mess with floats at all. 这可能就是为什么他们尽量不弄乱浮标的原因。

I really like Durandal's solution. 我非常喜欢Durandal的解决方案。 Despite its simplicity, it performs pretty well, at least for float s. 尽管它很简单,但至少对于float ,它的表现还不错。 For double s with their exponent longer than one byte, some additional bit rearrangement might help. 对于double大于1个字节的指数,可能需要一些其他的位重排。 The following table gives the encoding length for numbers with up to D digits, negative numbers are also considered. 下表给出了最多D位数字的编码长度,也考虑了负数。 In each column the first number given the maximum bytes needed while the parenthesized number is the average. 在每一列中,第一个数字给出了所需的最大字节数,而括号中的数字是平均值。

D   AS_INT    REV_FLOAT REV_DOUBLE BEST
1:  1 (1.0)   2 (1.8)   3 (2.2)    1 (1.0)
2:  2 (1.4)   3 (2.4)   3 (2.8)    2 (1.7)
3:  2 (1.9)   3 (2.9)   4 (3.2)    2 (2.0)
4:  3 (2.2)   4 (3.3)   4 (3.8)    3 (2.6)
5:  3 (2.9)   4 (3.9)   5 (4.1)    3 (3.0)
6:  3 (3.0)   5 (4.2)   5 (4.8)    4 (3.5)
7:  4 (3.9)   5 (4.8)   6 (5.1)    4 (3.9)
8:  4 (4.0)   5 (4.9)   6 (5.8)    5 (4.3)
9:  5 (4.9)   5 (4.9)   6 (6.0)    5 (4.9)

Four different methods were tested: 测试了四种不同的方法:

  • AS_INT: Simply convert the number to int . AS_INT:只需将数字转换为int This is unusable but gives us a lower bound. 这是无法使用的,但给了我们一个下限。
  • REV_FLOAT: The method by Durandal applied to float s. REV_FLOAT:Durandal的方法应用于float
  • REV_DOUBLE: The method by Durandal applied to double s. REV_DOUBLE:Durandal的方法应用于double
  • BEST: An improvement of my own method as described in the question. 最好:问题中描述的我自己方法的改进。 Rather complicated. 相当复杂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM