简体   繁体   English

在 C++ 中将 FP32 转换为 Bfloat16

[英]Convert FP32 to Bfloat16 in C++

如何在 C++ 中将浮点数(1 位符号、8 位表达式、23 位尾数)转换为 Bfloat16(1 位符号、8 位表达式、7 位尾数)?

memcpy wouldn't compile for me in the little endian case for some reason.由于某种原因,memcpy 在小端的情况下不会为我编译。 This is my solution.这是我的解决方案。 I have it as a struct here so that I can easily access the data and run through different ranges of values to confirm that it works properly.我在这里将它作为结构体,以便我可以轻松访问数据并运行不同范围的值以确认它正常工作。

struct bfloat16{
   unsigned short int data;
   public:
   bfloat16(){
      data = 0;
   }
   //cast to float
   operator float(){
      unsigned int proc = data<<16;
      return *reinterpret_cast<float*>(&proc);
   }
   //cast to bfloat16
   bfloat16& operator =(float float_val){
      data = (*reinterpret_cast<unsigned int *>(&float_val))>>16;
      return *this;
   }
};

//an example that enumerates all the possible values between 1.0f and 300.0f
using namespace std;

int main(){
   bfloat16 x;
   for(x = 1.0f; x < 300.0f; x.data++){
      cout<<x.data<<" "<<x<<endl;
   }
   
   return 0;
}

As demonstrated in the answer by Botje it is sufficient to copy the upper half of the float value since the bit patterns are the same.正如Botje回答所示,复制float值的上半部分就足够了,因为位模式是相同的。 The way it is done in that answer violates the rules about strict aliasing in C++.该答案中的做法违反了 C++ 中关于严格别名的规则。 The way around that is to use memcpy to copy the bits.解决方法是使用memcpy来复制这些位。

static inline tensorflow::bfloat16 FloatToBFloat16(float float_val)
{
    tensorflow::bfloat16 retval;
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    memcpy(&retval, &float_val, sizeof retval);
#else
    memcpy(&retval, reinterpret_cast<char *>(&float_val) + sizeof float_val - sizeof retval, sizeof retval);
#endif
    return retval;
}

If it's necessary to round the result rather than truncating it, you can multiply by a magic value to push some of those lower bits into the upper bits.如果有必要对结果进行四舍五入而不是截断它,您可以乘以一个魔术值,将其中的一些低位推入高位。

float_val *= 1.001957f;

From the Tensorflow implementation :Tensorflow 实现

static inline tensorflow::bfloat16 FloatToBFloat16(float float_val) {
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    return *reinterpret_cast<tensorflow::bfloat16*>(
        reinterpret_cast<uint16_t*>(&float_val));
#else
    return *reinterpret_cast<tensorflow::bfloat16*>(
        &(reinterpret_cast<uint16_t*>(&float_val)[1]));
#endif
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM