浮点范围缩小

Question

I'm implementing some 32-bit float trigonometry in C# using Mono, hopefully utilizing Mono.Simd. 我正在使用Mono在C＃中实现一些32位浮点三角函数，希望使用Mono.Simd。 I'm only missing solid range reduction currently. 我目前只缺少固体范围减少。 I'm rather stuck now, because apparently Mono's SIMD extensions does not include conversions between floats and integers, meaning I have no access to rounding/truncation which would be the usual method. 我现在很困惑，因为Mono的SIMD扩展显然不包括浮点数和整数之间的转换，这意味着我无法进行舍入/截断，这是常用的方法。 I can however convert bitwise between ints and floats. 然而，我可以在int和float之间按位转换。

Can something like this be done? 可以这样做吗？ I can scale the domain up and down if needed, but ideally the range reduction should result in a domain of [0, 2 pi] or [-pi, pi]. 如果需要，我可以上下调整域，但理想情况下，范围减少应该导致域[0,2 pi]或[-pi，pi]。 I have a hunch that it would be possible to do some IEEE magic with the exponent, if the domain is a power of 2, but I'm really not sure how to. 我有一种预感，如果域名是2的幂，就可以用指数做一些IEEE魔法，但我真的不知道该怎么做。

Edit: Okay, I've tried messing around with this C code and it feels like I'm on the verge of something (it doesn't work but the fractional part is always correct, in decimal / base10 at least...). 编辑：好的，我已经尝试搞乱这个C代码，感觉就像我正处于某个边缘（它不起作用，但小数部分始终是正确的，至少在十进制/ base10中......）。 The core principle seems to be getting the exponent difference between your domain and the input exponent, and composing a new float with a shifted mantissa and an adjusted exponent.. But it won't work for negatives, and I have no idea how to handle non-powers of 2 (or anything fractional - in fact, anything else than 2 doesn't work!). 核心原则似乎是获得你的域和输入指数之间的指数差异，并组成一个带有移位尾数和调整指数的新浮点数。但它不适用于负数，我不知道如何处理非幂2（或任何分数 - 事实上，除2以外的任何东西都不起作用！）。

// here's another more correct attempt:
float fmodulus(float val, int domain)
{
    const int mantissaMask = 0x7FFFFF;
    const int exponentMask = 0x7F800000;

    int ival = *(int*)&val;

    int mantissa = ival & mantissaMask;
    int rawExponent = ival & exponentMask;
    int exponent = (rawExponent >> 23) - (129 - domain);
    // powers over one:
    int p = exponent;

    mantissa <<= p;
    rawExponent = exponent >> p;
    rawExponent += 127;
    rawExponent <<= 23;

    int newVal = rawExponent & exponentMask;
    newVal |= mantissa & mantissaMask;

    float ret = *(float*)&newVal;

    return ret;
}

float range_reduce(float value, int range )
{
    const int mantissaMask = 0x7FFFFF;
    const int exponentMask = 0x7F800000;

    int ival = *(int*)&value;
    // grab exponent:
    unsigned exponent = (ival & exponentMask) >> 23;
    // grab mantissa:
    unsigned mantissa = ival & mantissaMask;

    // remove bias, and see how much the exponent is over range/domain
    unsigned char erange = (unsigned char)(exponent - (125 + range));
    // check if sign bit is set - that is, the exponent is under our range
    if (erange & 0x80)
    {
        // don't do anything then.
        erange = 0;
    }

    // shift mantissa (and chop off bits) by the reduced amount
    int inewVal = (mantissa << (erange)) & mantissaMask;
    // add exponent, and subtract the amount we reduced the argument with
    inewVal |= ((exponent - erange) << 23) & exponentMask;

    // reinterpret
    float newValue = *(float*)&inewVal;
    return newValue;
    //return newValue - ((erange) & 0x1 ? 1.0f : 0.0f);
}

int main()
{
    float val = 2.687f;
    int ival = *(int*)&val;
    float correct = fmod(val, 2);
    float own = range_reduce(val, 2);

    getc(stdin);
}

Edit 2: 编辑2：

Okay, I'm really trying to understand this in terms of the ieee binary system. 好吧，我真的试图用ieee二元系统来理解这一点。 If we write the modulus operation like this: 如果我们写这样的模数运算：

output = input % 2

[exponent] + [mantissa_bit_n_times_exponent]

3.5     = [2] + [1 + 0.5]                   ->[1] + [0.5]       = 1.5
4.5     = [4] + [0 + 0 + 0.5]               ->[0.5] + [0]       = 0.5
5.5     = [4] + [0 + 1 + 0.5]               ->[1] + [0.5]       = 1.5
2.5     = [2] + [0 + 0.5]                   ->[0.5] + [0]       = 0.5
2.25    = [2] + [0 + 0 + 0.25]              ->[0.25]            = 0.25
2.375   = [2] + [0 + 0 + 0.25 + 0.125]      ->[0.25] + [0.125]  = 0.375
13.5    = [8] + [4 + 0 + 1 + 0.5]           ->[1] + [0.5]       = 1.5
56.5    = [32] + [16 + 8 + 0 + 0 + 0 + 0.5] ->[0.5]             = 0.5

We can see the output in all cases is a new number, with no original exponent and the mantissa shifted an amount ( that is based on the exponent and the first non-zero bits of the mantissa after the first exponent-bits of the mantissa is ignored ) into the exponent. 我们可以看到在所有情况下的输出都是一个新的数字，没有原始指数，并且尾数移动了一个数量（ 这是基于指数和尾数的第一个指数位之后的尾数的第一个非零位）忽略 ）进入指数。 But I'm not really sure if this is the correct approach, it just works out nicely on paper. 但我不确定这是否是正确的方法，它只是在纸上很好地解决了。

Edit3: I'm stuck on Mono version 2.0.50727.1433 编辑3：我坚持使用Mono版本2.0.50727.1433

Answer 1

检查您的单声道版本，因为ConvertToInt和ConvertToIntTruncated 已于4年前添加，并且自2.10版本开始应该存在。

Answer 2

You can reduce the problem to taking a float mod 1. To simplify that, you can compute the floor of the float using bit operations, then use a floating point subtraction. 您可以减少浮点模式1的问题。为了简化这一点，您可以使用位操作计算浮点的底限，然后使用浮点减法。 The following is (unsafe) C# code for these operations: 以下是这些操作的（不安全）C＃代码：

// domain is assumed to be positive
// returns value in [0,domain)
public float fmodulus(float val, float domain)
{
    if (val < 0)
    {
        float negative = fmodulus(-val, domain);
        if (domain - negative == domain)
            return 0;
        else
            return domain-negative;
    }

    if (val < domain)
        return val; // this avoids losing accuracy

    return fmodOne(val / domain) * domain;
}

// assumes val >= 1, so val is positive and the exponent is at least 0 
unsafe public float fmodOne(float val)
{
    int iVal = *(int*)&val;
    int uncenteredExponent = iVal >> 23;
    int exponent = uncenteredExponent - 127; // 127 corresponds to 2^0 times the mantissa
    if (exponent >= 23) 
        return 0; // not enough precision to distinguish val from an integer

    int unneededBits = 23 - exponent; // between 0 and 23
    int iFloorVal = (iVal >> unneededBits) << unneededBits; // equivalent to using a mask to zero the bottom bits of the mantissa
    float floorVal = *(float*)&iFloorVal; // convert the bit pattern back to a float

    return val-floorVal;
}

For example, fmodulus(100.1f, 1) is 0.09999847. 例如，fmodulus（100.1f，1）是0.09999847。 The bit pattern of 100.1f is 100.1f的位模式是

0 10000101 10010000011001100110011 0 10000101 10010000011001100110011

The bit pattern of floorVal (100f) is floorVal（100f）的位模式是

0 10000101 10010000000000000000000 0 10000101 10010000000000000000000

A floating point subtraction gives something close to 0.1f: 浮点减法给出了接近0.1f的值：

0 01111011 10011001100110000000000 0 01111011 10011001100110000000000

Actually, I was surprised that the last 8 bits were zeroed out. 实际上，我很惊讶最后8位被清零了。 I thought only the last 6 bits of 0.1f were supposed to be replaced with 0. Perhaps one can do better than relying on the floating point subtraction. 我认为只有0.1f的最后6位应该被替换为0.也许一个人可以比依靠浮点减法更好。

浮点范围缩小

问题描述

2 个解决方案

解决方案1
1 2015-04-06 16:06:00

解决方案2
1 已采纳 2015-04-12 05:31:05

浮点范围缩小

问题描述

2 个解决方案

解决方案1 1 2015-04-06 16:06:00

解决方案2 1 已采纳 2015-04-12 05:31:05

解决方案1
1 2015-04-06 16:06:00

解决方案2
1 已采纳 2015-04-12 05:31:05