在 C++ 中将 FP32 转换为 Bfloat16

Question

如何在 C++ 中将浮点数（1 位符号、8 位表达式、23 位尾数）转换为 Bfloat16（1 位符号、8 位表达式、7 位尾数）？

Answer 1

memcpy wouldn't compile for me in the little endian case for some reason.由于某种原因，memcpy 在小端的情况下不会为我编译。 This is my solution.这是我的解决方案。 I have it as a struct here so that I can easily access the data and run through different ranges of values to confirm that it works properly.我在这里将它作为结构体，以便我可以轻松访问数据并运行不同范围的值以确认它正常工作。

struct bfloat16{
   unsigned short int data;
   public:
   bfloat16(){
      data = 0;
   }
   //cast to float
   operator float(){
      unsigned int proc = data<<16;
      return *reinterpret_cast<float*>(&proc);
   }
   //cast to bfloat16
   bfloat16& operator =(float float_val){
      data = (*reinterpret_cast<unsigned int *>(&float_val))>>16;
      return *this;
   }
};

//an example that enumerates all the possible values between 1.0f and 300.0f
using namespace std;

int main(){
   bfloat16 x;
   for(x = 1.0f; x < 300.0f; x.data++){
      cout<<x.data<<" "<<x<<endl;
   }
   
   return 0;
}

Answer 2

As demonstrated in the answer by Botje it is sufficient to copy the upper half of the float value since the bit patterns are the same.正如Botje的回答所示，复制float值的上半部分就足够了，因为位模式是相同的。 The way it is done in that answer violates the rules about strict aliasing in C++.该答案中的做法违反了 C++ 中关于严格别名的规则。 The way around that is to use memcpy to copy the bits.解决方法是使用memcpy来复制这些位。

static inline tensorflow::bfloat16 FloatToBFloat16(float float_val)
{
    tensorflow::bfloat16 retval;
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    memcpy(&retval, &float_val, sizeof retval);
#else
    memcpy(&retval, reinterpret_cast<char *>(&float_val) + sizeof float_val - sizeof retval, sizeof retval);
#endif
    return retval;
}

If it's necessary to round the result rather than truncating it, you can multiply by a magic value to push some of those lower bits into the upper bits.如果有必要对结果进行四舍五入而不是截断它，您可以乘以一个魔术值，将其中的一些低位推入高位。

float_val *= 1.001957f;

Answer 3

From the Tensorflow implementation :从Tensorflow 实现：

static inline tensorflow::bfloat16 FloatToBFloat16(float float_val) {
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    return *reinterpret_cast<tensorflow::bfloat16*>(
        reinterpret_cast<uint16_t*>(&float_val));
#else
    return *reinterpret_cast<tensorflow::bfloat16*>(
        &(reinterpret_cast<uint16_t*>(&float_val)[1]));
#endif
}

在 C++ 中将 FP32 转换为 Bfloat16

问题描述

3 个解决方案

解决方案1
2 2020-10-23 03:26:01

解决方案2
1 2019-03-21 23:08:44

解决方案3
0 2019-03-20 05:49:01

在 C++ 中将 FP32 转换为 Bfloat16

问题描述

3 个解决方案

解决方案1 2 2020-10-23 03:26:01

解决方案2 1 2019-03-21 23:08:44

解决方案3 0 2019-03-20 05:49:01

解决方案1
2 2020-10-23 03:26:01

解决方案2
1 2019-03-21 23:08:44

解决方案3
0 2019-03-20 05:49:01