简体   繁体   English

在 python-numpy 中定义一个自定义的 float8 并从/到 float16 转换?

[英]Define a custom float8 in python-numpy and convert from/to float16?

I am trying to define a custom 8 bit floating point format as follows:我正在尝试定义一个自定义的 8 位浮点格式,如下所示:

  • 1 sign bit 1 个符号位
  • 2 bits for mantissa尾数 2 位
  • 5 bits for exponent指数 5 位

Is it possible to define this as a numpy datatype?是否可以将其定义为 numpy 数据类型? If not, what is the easiest way to convert a numpy array of dtype float16 to such a format (for storage) and convert it back (for calculations in float16), maybe using the bit operations of numpy?如果不是,将 dtype float16 的 numpy 数组转换为这种格式(用于存储)并将其转换回(用于 float16 中的计算)的最简单方法是什么,也许使用 numpy 的位操作?

Why:为什么:

I am trying to optimize a neural network on custom hardware (FPGA).我正在尝试在自定义硬件 (FPGA) 上优化神经网络。 For this, I am playing around with various float representations.为此,我正在尝试各种浮动表示。 I have already built a forward pass framework for my neural network with numpy, therefore something like above will help me check the reduction in accuracy by storing the values in my custom datatype.我已经用 numpy 为我的神经网络构建了一个前向传递框架,因此上面的内容将帮助我通过将值存储在我的自定义数据类型中来检查准确性的降低。

I'm by no means an expert in numpy, but I like to think about FP representation problems.我绝不是 numpy 的专家,但我喜欢考虑 FP 表示问题。 The size of your array is not huge, so any reasonably efficient method should be fine.您的数组的大小并不大,因此任何合理有效的方法都应该没问题。 It doesn't look like there's an 8 bit FP representation, I guess because the precision isn't so good.看起来没有 8 位 FP 表示,我猜是因为精度不太好。

To convert to an array of bytes, each containing a single 8 bit FP value, for a single dimensional array, all you need is要转换为字节数组,每个字节包含一个 8 位 FP 值,对于一维数组,您只需要

float16 = np.array([6.3, 2.557])           # Here's some data in an array
float8s = array.tobytes()[1::2]
print(float8s)
>>> b'FAAF'

This just takes the high-order bytes from the 16 bit float by lopping off the low order part, giving a 1 bit sign, 5 bit exponent and 2 bit significand.这只是通过删除低位部分从 16 位浮点数中获取高位字节,给出 1 位符号、5 位指数和 2 位有效数。 The high order byte is always the second byte of each pair on a little-endian machine.高位字节始终是小端机器上每对的第二个字节。 I've tried it on a 2D array and it works the same.我已经在二维数组上尝试过它,它的工作原理是一样的。 This truncates.这截断了。 Rounding in decimal would be a whole other can of worms.十进制四舍五入将是另一类蠕虫。

Getting back to 16 bits would be just inserting zeros.回到 16 位只是插入零。 I found this method by experiment and there is undoubtedly a better way, but this reads the byte array as 8 bit integers and writes a new one as 16 bit integers and then converts it back to an array of floats.我通过实验发现了这种方法,并且无疑有更好的方法,但是这将字节数组读取为 8 位整数,并将新的字节数组写入为 16 位整数,然后将其转换回浮点数组。 Note the big-endian representation converting back to bytes as we want the 8 bit values to be the high order bytes of the integers.注意大端表示转换回字节,因为我们希望 8 位值是整数的高位字节。

float16 = np.frombuffer(np.array(np.frombuffer(float8s, dtype='u1'), dtype='>u2').tobytes(), dtype='f2')
print(float16)
>>> array([6. , 2.5, 2.5, 6. ], dtype=float16)

You can definitely see the loss of precision!你绝对可以看到精度的损失! I hope this helps.我希望这有帮助。 If this is sufficient, let me know.如果这足够了,请告诉我。 If not, I'd be up for looking deeper into it.如果没有,我会准备更深入地研究它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM