将数字与按位运算符相乘时的错误

Question

I am trying to multiply two floating-point numbers using bitwise operators in IEEE-754 format. 我试图使用IEEE-754格式的按位运算符乘以两个浮点数。 The 32-bit number is composed in the form sign - exponent - mantissa . 32位数字以符号 - 指数 - 尾数的形式组成。 After multiplying each number, the resultant answer is correct some of the time but not all of the time. 在将每个数字相乘之后，得到的答案在某些时间是正确的，但不是在所有时间都是正确的。

I think it has something to do with the resulting answer not being in normalized form (eg 1.1010101 * 2 ⁵ ), but I don't know how to fix it. 我认为它与得到的答案没有处于标准化形式（例如1.1010101 * 2 ⁵ ）有关，但我不知道如何解决它。

#include <csdtdio>

struct Real
{    
   int sign;
   long exponent;
   unsigned long fraction;
};

Real Multiply(Real Val1, Real Val2){
   Real answer;
   answer.fraction = left.fraction + right.fraction;
   answer.exponent = left.exponent  + right.exponent;
   answer.sign = left.sign ^ right.sign;
   return  answer;
}

Answer 1

While multiplying the mantissa parts must be multiplied together, not add 虽然乘以尾数部分必须相乘，而不是相加

(-1) ^sign1 × 2 ^exp1 × mantissa1 * (-1) ^sign2 × 2 ^exp2 × mantissa2 （-1） ^sign1 ×2 ^exp1 × ^mantissa1 *（ - 1） ^sign2 ×2 ^exp2 × ^mantissa2
= (-1) ^{sign1 + sign2} × 2 ^{exp1 + exp2} × mantissa1 × mantissa2 =（-1） ^{sign1 + sign2} ×2 ^{exp1 + exp2} × ^mantissa1 × ^mantissa2

And you don't need a separate variable for returning 并且您不需要单独的变量来返回

Real Multiply(Real Val1, Real Val2){
   Val1.fraction *= Val2.fraction;
   Val1.exponent += Val2.exponent;
   Val1.sign ^= Val2.sign;
   return Val1;
}

After those basic things you'll still have to do normalization, for which you need to get the full result instead of just the low bits like the normal non-widening multiplication. 在那些基本的东西之后，你仍然需要进行标准化，为此你需要获得完整的结果而不是像正常的非加宽乘法这样的低位。 Therefore you must cast your "fraction" (if you're using IEEE-754 then the correct term for it is significand ) to a wider type. 因此，您必须将“分数”（如果您使用的是IEEE-754，那么它的正确术语是有意义的 ）转换为更广泛的类型。 Depending on which platform you're on, you may or may not have a type twice as big as an unsigned long . 根据您所使用的平台，您可能拥有或不拥有两倍于unsigned long 。 It's better to used fixed-width types like int32_t , uint64_t in this case. 在这种情况下，最好使用固定宽度类型，如int32_t ， uint64_t 。 That's all the hints needed to do this 这就是完成这项工作所需的所有提示

将数字与按位运算符相乘时的错误

问题描述

1 个解决方案

解决方案1
4 2019-02-08 05:11:24

将数字与按位运算符相乘时的错误

问题描述

1 个解决方案

解决方案1 4 2019-02-08 05:11:24

解决方案1
4 2019-02-08 05:11:24