C / C ++-将32位浮点值转换为24位归一化定点值？

Question

Please let me know how to convert 32 bit float to 24 bit normalized value? 请让我知道如何将32位浮点数转换为24位归一化值吗？ What I tried is (units * (1 <<24) but doesn't seem to be working. Please help me with this. Thanks. 我试过的是（单位*（1 << 24），但似乎没有用。请帮助我。谢谢。

Answer 1

Of course it is not working, (1 << 24) is too large for a 24-bit number capable of representing 0 to store, by exactly 1 . 当然，它不工作时，（1 << 24）为能够代表0到存储，由正好1的24位的数过大。 To put this another way, 1 << 24 is actually a 25-bit number. 换句话说， 1 << 24实际上是一个25位数字。

Consider (units * ((1 << 24) - 1)) instead. 考虑改为(units * ((1 << 24) - 1)) 。

_{(1 << 24) - 1 is the largest value an unsigned 24-bit integer that begins at 0 can represent.} _{(1 << 24) - 1是一个从0开始的无符号24位整数可以表示的最大值。}

Now, a floating-point number in the range [ 0.0 - 1.0 ] will actually fit into an unsigned 24-bit fixed-point integer without overflow. 现在，在范围[0.0 - 1.0]一个浮点数实际上将装配到一个无符号的24位定点整数而不溢出。

Answer 2

A normalized fixed-point representation, means that the maximum representable value, not strictly reachable, is 1. So 1 is represented by 1<<24 . 归一化的定点表示意味着无法严格达到的最大可表示值是1。因此1表示为1<<24 。 See also Q Formats . 另请参阅Q格式。
For example Q24 means 24 fractional bits, 0 integer bit and no sign. 例如，Q24表示24个小数位，0个整数位和无符号。 If using a 32 bits unsigned integer to manage a Q24, the remainig 8 bits can be used to ease calculations. 如果使用32位无符号整数来管理Q24，则其余8位可用于简化计算。
Before translating from floating-point to fixed-point representation, you always have to define the range for your original value. 从浮点表示转换为定点表示之前，必须始终定义原始值的范围。 Example: the floating point value is a physical value in the range from [0, 5) , so 0 is included and 5 is not included in the range, and your fixed-point value is normalized to 5. 示例：浮点值是介于[0, 5)范围内的物理值，因此该范围包括[0, 5)不包括5，并且您的定点值被标准化为5。

#include <string.h>
#include <stdio.h>

float length_flp = 4.5;     // Units: meters. Range: [0,5)
float time_flp = 1.2;       // Seconds. Range: [0,2)
float speed_flp = 1.2;      // m/sec. Range: [0,2.5)
unsigned uint32_t length_fixp;   // Meters. Representation: Q24 = 24 bit normalized to MAX_LENGTH=5
unsigned uint32_t time_fixp;     // Seconds. Representation: Q24 = 24 bit normalized to MAX_TIME=2
unsigned uint32_t speed_fixp;    // m/sec. Repr: Q24 = 24 bit normalized to MAX_SPEED=(MAX_LENGTH/MAX_TIME)=2.5

void main(void)
{
    printf("length_flp=%f m\n", length_flp);
    printf("time_flp=%f sec\n", time_flp);
    printf("speed_flp=%f m/sec\n\n", length_flp / time_flp);

    length_fixp = (length_flp / 5) * (1 << 24);
    time_fixp = (time_flp / 2) * (1 << 24);
    speed_fixp = (length_fixp / (time_fixp >> 12)) << 12;

    printf("length_fixp=%d m\n", length_fixp);
    printf("time_fixp=%d sec\n", time_fixp);
    printf("speed_fixp = %d msec [fixed-point] = %f msec\n", speed_fixp, (float)speed_fixp / (1 << 24) * 2.5);
}

The advantage with normalized representation is that operations between normalized values return a normalized value. 标准化表示的优点是标准化值之间的运算将返回标准化值。 By the way, you have to define a generic function for each operation (division, multiplication, etc.), to prevent overflow and save precision. 顺便说一句，您必须为每个操作（除法，乘法等）定义一个通用函数，以防止溢出并节省精度。 As you can see I've used a small trick to calculate speed_fixp . 如您所见，我使用了一个小技巧来计算speed_fixp 。 The output is 输出是

length_flp=4.500000 m
time_flp=1.200000 sec
speed_flp=3.750000 m/sec

length_fixp = 15099494 m [fixed-point]
time_fixp = 10066330 sec [fixed-point]
speed_fixp = 25169920 msec [fixed-point] = 3.750610 msec

C / C ++-将32位浮点值转换为24位归一化定点值？

问题描述

2 个解决方案

解决方案1
0 2013-12-23 20:40:00

解决方案2
0 2013-12-31 09:29:43

C / C ++-将32位浮点值转换为24位归一化定点值？

问题描述

2 个解决方案

解决方案1 0 2013-12-23 20:40:00

解决方案2 0 2013-12-31 09:29:43

解决方案1
0 2013-12-23 20:40:00

解决方案2
0 2013-12-31 09:29:43