Multiplication of fixed point numbers

Question

I have a very basic question. In my program, i am doing multiplication of two fixed point numbers, which is given below. My inputs are of Q1.31 format and output also should be of same format. In order to do this, i am storing the result of multiplication in a temporary 64 bit variable and then doing some operations to get the result in required format.

int conversion1(float input, int Q_FORMAT)
{
return ((int)(input * ((1 << Q_FORMAT)-1)));
}

int mul(int input1, int input2, int format)
{
    __int64 result;
    result = (__int64)input1 * (__int64)input2;//Q2.62 format
    result = result << 1;//Q1.63 format
    result = result >> (format + 1);//33.31 format
    return (int)result;//Q1.31 format
}

int main()
{
    int Q_FORMAT = 31;
    float input1 = 0.5, input2 = 0.5;
    int q_input1, q_input2;
    int temp_mul;
    float q_muls;

    q_input1 = conversion1(input1, Q_FORMAT);
    q_input2 = conversion1(input2, Q_FORMAT);
    q_muls = ((float)temp_mul / ((1 << (Q_FORMAT)) - 1));
    printf("result of multiplication using q format = %f\n", q_muls);
    return 0; 
}

 My question is while converting float input to integer input (and also while converting int output 
 to float output), i am using (1<<Q_FORMAT)-1 format. But i have seen people using (1<<Q_FORMAT) 
 directly in their codes. The Problem i am facing when using (1<<Q_FORMAT) is i am getting the 
 negative of the desired result.

For example, in my program,

 If i use (1<<Q_FORMAT), i am getting -0.25 as the result
 But, if i use (1<<Q_FORMAT)-1, i am getting 0.25 as the result which is correct.

Where am i going wrong? Do i need to understand any other concepts?

Answer 1

On common platforms, int is a two's complement 32-bit integer providing 31 digits (plus a 'sign' bit). It's a bit too narrow to represent a Q1.31 number which requires 32 digits (plus a 'sign' bit).

In your example, this is manifesting as effective arithmetic overflow in the expression, 1 << Q_FORMAT .

To avoid this, you need to either use a type providing more digits (eg long long ) or a fixed-point format requiring fewer digits ( eg Q1.30 ). You can use unsigned to fix your example but the result will be a 'sign' bit short of Q2.30.

Multiplication of fixed point numbers

Question

1 answers

solution1
1 ACCPTED 2020-03-14 10:51:52

Multiplication of fixed point numbers

Question

1 answers

solution1 1 ACCPTED 2020-03-14 10:51:52

solution1
1 ACCPTED 2020-03-14 10:51:52