[英]Convert FP32 to Bfloat16 in C++
如何在 C++ 中将浮点数(1 位符号、8 位表达式、23 位尾数)转换为 Bfloat16(1 位符号、8 位表达式、7 位尾数)?
memcpy wouldn't compile for me in the little endian case for some reason.由于某种原因,memcpy 在小端的情况下不会为我编译。 This is my solution.
这是我的解决方案。 I have it as a struct here so that I can easily access the data and run through different ranges of values to confirm that it works properly.
我在这里将它作为结构体,以便我可以轻松访问数据并运行不同范围的值以确认它正常工作。
struct bfloat16{
unsigned short int data;
public:
bfloat16(){
data = 0;
}
//cast to float
operator float(){
unsigned int proc = data<<16;
return *reinterpret_cast<float*>(&proc);
}
//cast to bfloat16
bfloat16& operator =(float float_val){
data = (*reinterpret_cast<unsigned int *>(&float_val))>>16;
return *this;
}
};
//an example that enumerates all the possible values between 1.0f and 300.0f
using namespace std;
int main(){
bfloat16 x;
for(x = 1.0f; x < 300.0f; x.data++){
cout<<x.data<<" "<<x<<endl;
}
return 0;
}
As demonstrated in the answer by Botje it is sufficient to copy the upper half of the float
value since the bit patterns are the same.正如Botje的回答所示,复制
float
值的上半部分就足够了,因为位模式是相同的。 The way it is done in that answer violates the rules about strict aliasing in C++.该答案中的做法违反了 C++ 中关于严格别名的规则。 The way around that is to use
memcpy
to copy the bits.解决方法是使用
memcpy
来复制这些位。
static inline tensorflow::bfloat16 FloatToBFloat16(float float_val)
{
tensorflow::bfloat16 retval;
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
memcpy(&retval, &float_val, sizeof retval);
#else
memcpy(&retval, reinterpret_cast<char *>(&float_val) + sizeof float_val - sizeof retval, sizeof retval);
#endif
return retval;
}
If it's necessary to round the result rather than truncating it, you can multiply by a magic value to push some of those lower bits into the upper bits.如果有必要对结果进行四舍五入而不是截断它,您可以乘以一个魔术值,将其中的一些低位推入高位。
float_val *= 1.001957f;
From the Tensorflow implementation :从Tensorflow 实现:
static inline tensorflow::bfloat16 FloatToBFloat16(float float_val) {
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
return *reinterpret_cast<tensorflow::bfloat16*>(
reinterpret_cast<uint16_t*>(&float_val));
#else
return *reinterpret_cast<tensorflow::bfloat16*>(
&(reinterpret_cast<uint16_t*>(&float_val)[1]));
#endif
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.