简体   繁体   English

如何仅使用整数算法生成IEEE 754单精度浮点数?

[英]How to generate an IEEE 754 Single-precision float using only integer arithmetic?

Assuming a low end microprocessor with no floating point arithmetic, I need to generate an IEE754 single precision floating point format number to push out to a file. 假设没有浮点运算的低端微处理器,我需要生成一个IEE754单精度浮点格式编号以推送到文件中。

I need to write a function that takes three integers being the sign, whole and the fraction and returns a byte array with 4 bytes being the IEEE 754 single precision representation. 我需要编写一个函数,该函数使用三个整数作为符号,整数和分数,并返回一个字节数组,其中4字节为IEEE 754单精度表示形式。

Something like: 就像是:

// Convert 75.65 to 4 byte IEEE 754 single precision representation
char* float = convert(0, 75, 65);

Does anybody have any pointers or example C code please? 请问有人有任何指针或示例C代码吗? I'm particularly struggling to understand how to convert the mantissa. 我特别想了解如何转换尾数。

You will need to generate the sign (1 bit), the exponent (8 bits, a biased power of 2), and the fraction/mantissa (23 bits). 您将需要生成符号(1位),指数(8位,2的有偏幂)和分数/尾数(23位)。

Bear in mind that the fraction has an implicit leading '1' bit, which means that the most significant leading '1' bit (2^22) is not stored in the IEEE format. 请记住,小数具有隐式的前导“ 1”位,这意味着最高有效的前导“ 1”位(2 ^ 22)未以IEEE格式存储。 For example, given a fraction of 0x755555 (24 bits), the actual bits stored would be 0x355555 (23 bits). 例如,给定的分数为0x755555(24位),实际存储的位将为0x355555(23位)。

Also bear in mind that the fraction is shifted so that the binary point is immediately to the right of the implicit leading '1' bit. 还请记住,小数部分已移位,因此二进制点立即位于隐式前导“ 1”位的右侧。 So an IEEE 23-bit fraction of 11 0101 0101... represents the 24-bit binary fraction 1.11 0101 0101... This means that the exponent has to be adjusted accordingly. 因此,IEEE 23位小数11 0101 0101 ...表示24位二进制小数1.11 0101 0101 ...这意味着必须相应地调整指数。

Does the value have to be written big endian or little endian? 值必须写成大端还是小端? Reversed bit ordering? 逆序排序?

If you are free, you should think about writing the value as string literal. 如果您有空,则应考虑将值写为字符串文字。 That way you can easily convert the integer: just write the int part and write "e0" as exponent (or omit the exponent and write ".0"). 这样,您可以轻松地转换整数:只需编写int部分并将“ e0”写为指数即可(或省略该指数并编写“ .0”)。

For the binary representation, you should have a look at Wikipedia . 对于二进制表示形式,您应该看看Wikipedia Best is to first assemble the bitfields to an uint32_t - the structure is given in the linked article. 最好的方法是首先将位字段组装为uint32_t结构在链接的文章中给出。 Note that you might have to round if the integer has more than 23 bits value. 请注意,如果整数的值大于23位,则可能必须舍入。 Remember to normalize the generated value. 记住要标准化生成的值。

Second step will be to serialize the uint32_t to an uint8_t -array. 第二步是将uint32_t序列化为uint8_t -array。 Mind the endianess of the result! 注意结果的顽固性!

Also note to use uint8_t for the result if you really want 8 bit values; 另外请注意,如果您确实需要8位值,请使用uint8_t作为结果; you should use an unsigned type. 您应该使用无符号类型。 For the intermediate representation, using uint32_t is recommended as that will guarantee you operate on 32 bit values. 对于中间表示,建议使用uint32_t ,因为这将确保您对32位值进行操作。

You haven't had a go yet so no give aways. 您还没有去,所以没有放弃。

Remember you can regard two 32-bit integers a & b to be interpreted as a decimal ab as being a single 64-bit integer with an exponent of 2^-32 (where ^ is exponent). 请记住,您可以将两个32位整数a和b解释为十进制ab,是一个具有2 ^ -32的指数(其中^是指数)的单个64位整数。

So without doing anything you've got it in the form: 因此,您无需执行任何操作即可获得以下形式的信息:

s * m * 2^e

The only problem is your mantissa is too long and your number isn't normalized. 唯一的问题是尾数太长,您的数字未标准化。

A bit of shifting and adding/subtracting with a possible rounding step and you're done. 进行一些舍入和加法/减法以及可能的舍入步骤,您就完成了。

The basic premise is to: 基本前提是:

  1. Given binary32 float . 给定binary32 float
  2. Form a binary fixed-point representation of the combined whole and factional parts hundredths . 形成hundredths whole和派系部分的二进制定点表示形式。 This code uses a structure encoding both whole and hundredths fields separately. 此代码使用分别对整个字段和百分之一的字段进行编码的结构。 Important that the whole field is at least 32 bits. 重要的是whole字段至少为32位。
  3. Shift left/right (*2 and /2) until MSbit is in the implied bit position whilst counting the shifts. 向左/向右移动(* 2和/ 2),直到MSbit处于隐含的位位置,同时计数移位。 A robust solution would also note non-zero bits shifted out. 健壮的解决方案还将注意到非零位被移出。
  4. Form a biased exponent. 形成有偏指数。
  5. Round mantissa and drop implied bit. 圆尾数和掉落暗示位。
  6. Form sign (not done here). 表格签名(此处未完成)。
  7. Combine the above 3 steps to form the answer. 结合以上3个步骤即可得出答案。
  8. As Sub-normals, infinites & Not-A-Number will not result with whole, hundredths input, generating those float special cases are not addressed here. 作为次法线,无穷大和非数字将不会与whole, hundredths输入一起产生,生成那些float特例的情况在此不予解决。

.

#include <assert.h>
#include <stdint.h>
#define IMPLIED_BIT 0x00800000L

typedef struct {
  int_least32_t whole;
  int hundreth;
} x_xx;

int_least32_t covert(int whole, int hundreth) {
  assert(whole >= 0 && hundreth >= 0 && hundreth < 100);
  if (whole == 0 && hundreth == 0) return 0;
  x_xx x = { whole, hundreth };
  int_least32_t expo = 0;
  int sticky_bit = 0; // Note any 1 bits shifted out
  while (x.whole >= IMPLIED_BIT * 2) {
    expo++;
    sticky_bit |= x.hundreth % 2;
    x.hundreth /= 2;
    x.hundreth += (x.whole % 2)*(100/2);
    x.whole /= 2;
  }
  while (x.whole < IMPLIED_BIT) {
    expo--;
    x.hundreth *= 2;
    x.whole *= 2;
    x.whole += x.hundreth / 100;
    x.hundreth %= 100;
  }
  int32_t mantissa = x.whole;
  // Round to nearest - ties to even
  if (x.hundreth >= 100/2 && (x.hundreth > 100/2 || x.whole%2 || sticky_bit)) {
    mantissa++;
  }
  if (mantissa >= (IMPLIED_BIT * 2)) {
    mantissa /= 2;
    expo++;
  }
  mantissa &= ~IMPLIED_BIT;  // Toss MSbit as it is implied in final
  expo += 24 + 126; // Bias: 24 bits + binary32 bias
  expo <<= 23; // Offset
  return expo | mantissa;
}

void test_covert(int whole, int hundreths) {
  union {
    uint32_t u32;
    float f;
  } u;
  u.u32 = covert(whole, hundreths);
  volatile float best = whole + hundreths / 100.0;
  printf("%10d.%02d --> %15.6e %15.6e Same:%d\n", whole, hundreths, u.f, best,
      best == u.f);
}

#include <limits.h>
int main(void) {
  test_covert(75, 65);
  test_covert(0, 1);
  test_covert(INT_MAX, 99);
  return 0;

}

Output 输出量

        75.65 -->    7.565000e+01    7.565000e+01 Same:1
         0.01 -->    1.000000e-02    1.000000e-02 Same:1
2147483647.99 -->    2.147484e+09    2.147484e+09 Same:1

Known issues: sign not applied. 已知问题:未应用标志。

You can use a software floating point compiler/library. 您可以使用软件浮点编译器/库。
See https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html 参见https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM