简体   繁体   English

C中浮点数的定点算术移位

[英]Bit shifting for fixed point arithmetic on float numbers in C

i wrote the following test code to check fixed point arithmetic and bit shifting. 我编写了以下测试代码以检查定点算法和移位。

void main(){
    float x = 2;
    float y = 3;
    float z = 1;
    unsigned int * px = (unsigned int *) (& x);
    unsigned int * py = (unsigned int *) (& y);
    unsigned int * pz = (unsigned int *) (& z);
    *px <<= 1;
    *py <<= 1;
    *pz <<= 1;
    *pz =*px + *py;
    *px >>= 1;
    *py >>= 1;
    *pz >>= 1;
    printf("%f %f %f\n",x,y,z);
  }

The result is 2.000000 3.000000 0.000000 结果是2.000000 3.000000 0.000000

Why is the last number 0? 为什么最后一个数字为0? I was expecting to see a 5.000000 I want to use some kind of fixed point arithmetic to bypass the use of floating point numbers on an image processing application. 我期望看到一个5.000000,我想使用某种定点算法来绕过图像处理应用程序上浮点数的使用。 Which is the best/easiest/most efficient way to turn my floating point arrays into integers? 将浮点数组转换为整数的最佳/最简单/最有效的方法是哪种? Is the above "tricking the compiler" a robust workaround? 上面的“触发编译器”是一种可靠的解决方法吗? Any suggestions? 有什么建议么?

If you want to use fixed point, dont use type ' float ' or ' double ' because them has internal structure. 如果要使用定点,请不要使用类型“ float ”或“ double ”,因为它们具有内部结构。 Floats and Doubles have specific bit for sign; 浮动和双打有特定的符号位; some bits for exponent, some for mantissa (take a look on color image here ); 一些用于指数,一些用于尾数(在此处查看彩色图像); so they inherently are floating point. 因此它们本质上是浮点数。

You should either program fixed point by hand storing data in integer type, or use some fixed-point library (or language extension). 您应该手动编程定点存储整数类型的数据,或者使用一些定点库(或语言扩展名)。

There is a description of Floating point extensions implemented in GCC: http://gcc.gnu.org/onlinedocs/gcc/Fixed_002dPoint.html 以下是在GCC中实现的浮点扩展的说明: http : //gcc.gnu.org/onlinedocs/gcc/Fixed_002dPoint.html

There is some MACRO-based manual implementation of fixed-point for C: http://www.eetimes.com/discussion/other/4024639/Fixed-point-math-in-C 有一些基于MACRO的C定点手动实现: http : //www.eetimes.com/discussion/other/4024639/Fixed-point-math-in-C

What you are doing are cruelties to the numbers. 你在做什么是残酷的数字。

First, you assign values to float variables. 首先,将值分配给float变量。 How they are stored is system dependant, but normally, IEEE 754 format is used. 它们的存储方式取决于系统,但是通常使用IEEE 754格式 So your variables internally look like 所以你的变量在内部看起来像

x = 2.0 = 1 * 2^1   : sign = 0, mantissa = 1,   exponent = 1 -> 0 10000000 00000000000000000000000 = 0x40000000
y = 3.0 = 1.5 * 2^1 : sign = 0, mantissa = 1.5, exponent = 1 -> 0 10000000 10000000000000000000000 = 0x40400000
z = 1.0 = 1 * 2^0   : sign = 0, mantissa = 1,   exponent = 0 -> 0 01111111 00000000000000000000000 = 0x3F800000

If you do some bit shiftng operations on these numbers, you mix up the borders between sign, exponent and mantissa and so anything can, may and will happen. 如果对这些数字进行一些移位运算,则会混淆符号,指数和尾数之间的边界,因此任何事情都可能发生,并且将会发生。

In your case: 在您的情况下:

  • your 2.0 becomes 0x80000000, resulting in -0.0, 您的2.0变为0x80000000,结果为-0.0,
  • your 3.0 becomes 0x80800000, resulting in -1.1754943508222875e-38, 您的3.0变为0x80800000,结果为-1.1754943508222875e-38,
  • your 1.0 becomes 0x7F000000, resulting in 1.7014118346046923e+38. 您的1.0变为0x7F000000,结果为1.7014118346046923e + 38。

The latter you lose by adding -0.0 and -1.1754943508222875e-38, which becomes the latter, namely 0x80800000, which should be, after >>ing it by 1, 3.0 again. 后者会因添加-0.0和-1.1754943508222875ee-38而丢失,后者变为后者,即0x80800000,在将>>再乘以1,3.0后应为0x80800000。 I don't know why it isn't, probably because I made a mistake here. 我不知道为什么不是,可能是因为我在这里犯了一个错误。

What stays is that you cannot do bit-shifting on floats an expect a reliable result. 留下来的是,您不能对浮点数进行移位,而期望得到可靠的结果。

I would consider converting them to integer or other fixed-point on the ARM and sending them over the line as they are. 我会考虑将它们转换为ARM上的整数或其他定点,然后按原样通过线路发送。

It's probable that your compiler uses IEEE 754 format for float s, which in bit terms, looks like this: 您的编译器可能对float使用IEEE 754格式,按位表示,看起来像这样:

SEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFF
^ bit 31                       ^ bit 0

S is the sign bit s = 1 implies the number is negative. S是符号位s = 1表示数字为负。

E bits are the exponent. E位是指数。 There are 8 exponent bits giving a range of 0 - 255 but the exponent is biased - you need to subtract 127 to get the true exponent. 有8个指数位,范围为0-255, 但是指数是有偏的-您需要减去127才能得到真实的指数。

F bits are the fraction part, however, you need to imagine an invisible 1 on the front so the fraction is always 1.something and all you see are the binary fraction digits. F位是小数部分,但是,您需要想象前面不可见的1,所以小数始终为1.something,并且您所看到的都是二进制小数位。

The number 2 is 1 x 2 1 = 1 x 2 128 - 127 so is encoded as 数字2是1 x 2 1 = 1 x 2 128-127,因此被编码为

01000000000000000000000000000000

So if you use a bit shift to shift it right you get 因此,如果您稍微移动一下就可以得到

10000000000000000000000000000000

which by convention is -0 in IEEE754, so rather than multiplying your number by 2 your shift has made it zero. 按照惯例,在IEEE754中,该值为-0,因此与其将您的数字乘以2,您的移位是将其设为零。

The number 3 is [1 + 0.5] x 2 128 - 127 数字3是[1 + 0.5] x 2128-127

which is represented as 表示为

01000000010000000000000000000000

Shifting that left gives you 向左移动可为您提供

10000000100000000000000000000000

which is -1 x 2 -126 or some very small number. 这是-1 x 2 -126或非常小的数字。

You can do the same for z, but you probably get the idea that shifting just screws up floating point numbers. 您可以对z进行相同的操作,但是您可能会想到,移位只是搞砸了浮点数。

Fixed point doesn't work that way. 固定点不能那样工作。 What you want to do is something like this: 您想要做的是这样的:

void main(){
    // initing 8bit fixed point numbers
    unsigned int x = 2 << 8;
    unsigned int y = 3 << 8;
    unsigned int z = 1 << 8;

    // adding two numbers
    unsigned int a = x + y;

    // multiplying two numbers with fixed point adjustment
    unsigned int b = (x * y) >> 8;

    // use numbers
    printf("%d %d\n", a >> 8, b >> 8);
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM