简体   繁体   English

添加 IEEE-754 格式的正数和负数

[英]Adding positive and negative numbers in IEEE-754 format

My problem seems to be pretty simple: I wrote a program that manually adds floating point numbers together.我的问题似乎很简单:我编写了一个手动将浮点数相加的程序。 This program has certain restrictions.这个程序有一定的限制。 (such as no iostream or use of any unary operators), so that is the reason for the lack of those things. (例如没有 iostream 或使用任何一元运算符),这就是缺少这些东西的原因。 As for the problem, the program seems to function correctly when adding two positive floats (1.5 + 1.5 = 3.0, for example), but when adding two negative numbers (10.0 + -5.0) I get very wacky numbers.至于问题,当添加两个正浮点数(例如,1.5 + 1.5 = 3.0)时,程序似乎 function 正确,但是当添加两个负数(10.0 + -5.0)时,我得到了非常古怪的数字。 Here is the code:这是代码:

#include <cstdio>
#define BIAS32 127

struct Real
{
    //sign bit
    int sign;
    //UNBIASED exponent
    long exponent;
    //Fraction including implied 1. at bit index 23
    unsigned long fraction;
};

Real Decode(int float_value);
int Encode(Real real_value);
Real Normalize(Real value);
Real Add(Real left, Real right);
unsigned long Add(unsigned long leftop, unsigned long rightop);
unsigned long Multiply(unsigned long leftop, unsigned long rightop);
void alignExponents(Real* left, Real* right);
bool is_neg(Real real);
int Twos(int op);

int main(int argc, char* argv[])
{
    int left, right;
    char op;
    int value;
    Real rLeft, rRight, result;
    if (argc < 4) {
        printf("Usage: %s <left> <op> <right>\n", argv[0]);
        return -1;
    }
    sscanf(argv[1], "%f", (float*)&left);
    sscanf(argv[2], "%c", &op);
    sscanf(argv[3], "%f", (float*)&right);
    rLeft = Decode(left);
    rRight = Decode(right);

    if (op == '+') {
        result = Add(rLeft, rRight);
    }
    else {
        printf("Unknown operator '%c'\n", op);
        return -2;
    }
    value = Encode(result);
    printf("%.3f %c %.3f = %.3f (0x%08x)\n",
        *((float*)&left),
        op,
        *((float*)&right),
        *((float*)&value),
        value
    );
    return 0;
}

Real Decode(int float_value)
{             // Test sign bit of float_value - Test exponent bits of float_value & apply bias - Test mantissa bits of float_value
    Real result{ float_value >> 31 & 1 ? 1 : 0, ((long)Add(float_value >> 23 & 0xFF, -BIAS32)), (unsigned long)float_value & 0x7FFFFF };
    return result;
};
    
int Encode(Real real_value)
{
    int x = 0;
    x |= real_value.fraction; // Set the fraction bits of x 
    x |= real_value.sign << 31; // Set the sign bits of x
    x |= Add(real_value.exponent, BIAS32) << 23; // Set the exponent bits of x
    return x;
}

Real Normalize(Real value)
{
    if (is_neg(value))
    {
        value.fraction = Twos(value.fraction);
    }
    unsigned int i = 0;
    while (i < 9)
    {
        if ((value.fraction >> Add(23, i)) & 1) // If there are set bits past the mantissa section
        {
            value.fraction >>= 1; // shift mantissa right by 1
            value.exponent = Add(value.exponent, 1); // increment exponent to accomodate for shift
        }
        i = Add(i, 1);
    }
    return value;
}

Real Add(Real left, Real right)
{
    Real a = left, b = right;
    alignExponents(&a, &b); // Aligns exponents of both operands
    unsigned long sum = Add(a.fraction, b.fraction);
    Real result = Normalize({ a.sign, a.exponent, sum }); // Normalize result if need be
    return result;
}

unsigned long Add(unsigned long leftop, unsigned long rightop)
{
    unsigned long sum = 0, test = 1; // sum initialized to 0, test created to compare bits
    while (test) // while test is not 0
    {
        if (leftop & test) // if the digit being tested is 1
        {
            if (sum & test) sum ^= test << 1; // if the sum tests to 1, carry a bit over
            sum ^= test;
        }
        if (rightop & test)
        {
            if (sum & test) sum ^= test << 1;
            sum ^= test;
        }
        test <<= 1;
    }
    return sum;
}

void alignExponents(Real* a, Real* b)
{
    if (a->exponent != b->exponent) // If the exponents are not equal
    {
        if (a->exponent > b->exponent)
        {
            int disp = a->exponent - b->exponent; // number of shifts needed based on difference between two exponents
            b->fraction |= 1 << 23; // sets the implicit bit for shifting
            b->exponent = a->exponent; // sets exponents equal to each other
            b->fraction >>= disp; // mantissa is shifted over to accomodate for the increase in power
            return;
        }
        int disp = b->exponent - a->exponent;
        a->fraction |= 1 << 23;
        a->exponent = b->exponent;
        a->fraction >>= disp;
        return;
    }
    return;
}

bool is_neg(Real real)
{
    if (real.sign) return true;
    return false;
}

int Twos(int op)
{
    return Add(~op, -1); // NOT the operand and add 1 to it
}

On top of that, I just tested the values 10.5 + 5.5 and got a 24.0, so there appears to be even more wrong with this than I initially thought.最重要的是,我刚刚测试了 10.5 + 5.5 的值并得到了 24.0,所以这似乎比我最初想象的更错误。 I've been working on this for days and would love some help/advice.我已经为此工作了好几天,希望得到一些帮助/建议。

Here is some help/advice.这是一些帮助/建议。 Now that you have worked on some of the code, I suggest going back and reworking your data structure.现在您已经编写了一些代码,我建议您返回并重新设计您的数据结构。 The declaration of such a crucial data structure would benefit from a lot more comments, making sure you know exactly what each field means.如此重要的数据结构的声明将受益于更多的注释,确保您确切地知道每个字段的含义。

For example, the implicit bit is not always 1. It is zero if the exponent is zero.例如,隐式位并不总是 1。如果指数为零,则它为零。 That should be dealt with in your Encode and Decode functions.这应该在您的 Encode 和 Decode 函数中处理。 For the rest of your code, it is just a significand bit and should not have any special handling.对于您的代码的 rest,它只是一个有效位,不应有任何特殊处理。

When you start thinking about rounding, you will find you often need more than 23 bits in an intermediate result.当您开始考虑舍入时,您会发现在中间结果中通常需要超过 23 位。

Making the significand of negative numbers 2's complement will create a problem of having the same information stored two ways.将负数的有效位设为 2 的补码将产生以两种方式存储相同信息的问题。 You will have both a sign bit as though doing sign-and-magnitude and have the sign encoded in the signed integer signficand.您将同时拥有一个符号位,就像在做符号和大小一样,并且将符号编码在有符号的 integer 符号中。 Keeping them consistent will be a mess.保持它们一致将是一团糟。 Whatever you decide about how Real will store negative numbers, document it and keep it consistent throughout.无论您决定 Real 将如何存储负数,都将其记录下来并始终保持一致。

If I were implementing this I would start by defining Real very, very carefully.如果我要实现这一点,我会从非常非常仔细地定义 Real 开始。 I would then decide what operations I wanted to be able to do on Real, and write functions to do them.然后我会决定我希望能够在 Real 上执行哪些操作,并编写函数来执行这些操作。 If you get those right each function will be relatively simple.如果你做对了,每个 function 都会相对简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何输出IEEE-754格式的整数作为浮点数 - How to output IEEE-754 format integer as a float 为什么IEEE-754浮点数不能在平台之间交换? - Why is IEEE-754 Floating Point not exchangable between platforms? IEEE-754的浮点数,双精度数和四进制数是否保证精确表示-2,-1,-0、0、1、2? - Does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2? IEEE-754 浮点指数 Alignment 问题 - IEEE-754 Floating Point Exponent Alignment Issue 符合IEEE-754标准的半圆到偶数 - IEEE-754 compliant round-half-to-even IEEE-754浮点计算,相等和缩小 - IEEE-754 floating point computations, equality and narrowing 添加负数和正数最多 10^100000 - Adding negative and positive numbers up to 10^100000 返回浮点类型是否完全符合 IEEE-754 的函数? - Function that returns whether the floating-point type is fully compliant to IEEE-754? 如何将float转换为double(都存储在IEEE-754表示中)而不会丢失精度? - How to convert float to double(both stored in IEEE-754 representation) without losing precision? 计算平方根时,IEEE-754正确的结果是什么? - What is accepted as the IEEE-754 correct result when computing this square root?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM