简体   繁体   English


[英]Floating point range reduction

I'm implementing some 32-bit float trigonometry in C# using Mono, hopefully utilizing Mono.Simd. 我正在使用Mono在C#中实现一些32位浮点三角函数,希望使用Mono.Simd。 I'm only missing solid range reduction currently. 我目前只缺少固体范围减少。 I'm rather stuck now, because apparently Mono's SIMD extensions does not include conversions between floats and integers, meaning I have no access to rounding/truncation which would be the usual method. 我现在很困惑,因为Mono的SIMD扩展显然不包括浮点数和整数之间的转换,这意味着我无法进行舍入/截断,这是常用的方法。 I can however convert bitwise between ints and floats. 然而,我可以在int和float之间按位转换。

Can something like this be done? 可以这样做吗? I can scale the domain up and down if needed, but ideally the range reduction should result in a domain of [0, 2 pi] or [-pi, pi]. 如果需要,我可以上下调整域,但理想情况下,范围减少应该导致域[0,2 pi]或[-pi,pi]。 I have a hunch that it would be possible to do some IEEE magic with the exponent, if the domain is a power of 2, but I'm really not sure how to. 我有一种预感,如果域名是2的幂,就可以用指数做一些IEEE魔法,但我真的不知道该怎么做。

Edit: Okay, I've tried messing around with this C code and it feels like I'm on the verge of something (it doesn't work but the fractional part is always correct, in decimal / base10 at least...). 编辑:好的,我已经尝试搞乱这个C代码,感觉就像我正处于某个边缘(它不起作用,但小数部分始终是正确的,至少在十进制/ base10中......) 。 The core principle seems to be getting the exponent difference between your domain and the input exponent, and composing a new float with a shifted mantissa and an adjusted exponent.. But it won't work for negatives, and I have no idea how to handle non-powers of 2 (or anything fractional - in fact, anything else than 2 doesn't work!). 核心原则似乎是获得你的域和输入指数之间的指数差异,并组成一个带有移位尾数和调整指数的新浮点数。但它不适用于负数,我不知道如何处理非幂2(或任何分数 - 事实上,除2以外的任何东西都不起作用!)。

// here's another more correct attempt:
float fmodulus(float val, int domain)
    const int mantissaMask = 0x7FFFFF;
    const int exponentMask = 0x7F800000;

    int ival = *(int*)&val;

    int mantissa = ival & mantissaMask;
    int rawExponent = ival & exponentMask;
    int exponent = (rawExponent >> 23) - (129 - domain);
    // powers over one:
    int p = exponent;

    mantissa <<= p;
    rawExponent = exponent >> p;
    rawExponent += 127;
    rawExponent <<= 23;

    int newVal = rawExponent & exponentMask;
    newVal |= mantissa & mantissaMask;

    float ret = *(float*)&newVal;

    return ret;

float range_reduce(float value, int range )
    const int mantissaMask = 0x7FFFFF;
    const int exponentMask = 0x7F800000;

    int ival = *(int*)&value;
    // grab exponent:
    unsigned exponent = (ival & exponentMask) >> 23;
    // grab mantissa:
    unsigned mantissa = ival & mantissaMask;

    // remove bias, and see how much the exponent is over range/domain
    unsigned char erange = (unsigned char)(exponent - (125 + range));
    // check if sign bit is set - that is, the exponent is under our range
    if (erange & 0x80)
        // don't do anything then.
        erange = 0;

    // shift mantissa (and chop off bits) by the reduced amount
    int inewVal = (mantissa << (erange)) & mantissaMask;
    // add exponent, and subtract the amount we reduced the argument with
    inewVal |= ((exponent - erange) << 23) & exponentMask;

    // reinterpret
    float newValue = *(float*)&inewVal;
    return newValue;
    //return newValue - ((erange) & 0x1 ? 1.0f : 0.0f);

int main()
    float val = 2.687f;
    int ival = *(int*)&val;
    float correct = fmod(val, 2);
    float own = range_reduce(val, 2);


Edit 2: 编辑2:

Okay, I'm really trying to understand this in terms of the ieee binary system. 好吧,我真的试图用ieee二元系统来理解这一点。 If we write the modulus operation like this: 如果我们写这样的模数运算:

output = input % 2

[exponent] + [mantissa_bit_n_times_exponent]

3.5     = [2] + [1 + 0.5]                   ->[1] + [0.5]       = 1.5
4.5     = [4] + [0 + 0 + 0.5]               ->[0.5] + [0]       = 0.5
5.5     = [4] + [0 + 1 + 0.5]               ->[1] + [0.5]       = 1.5
2.5     = [2] + [0 + 0.5]                   ->[0.5] + [0]       = 0.5
2.25    = [2] + [0 + 0 + 0.25]              ->[0.25]            = 0.25
2.375   = [2] + [0 + 0 + 0.25 + 0.125]      ->[0.25] + [0.125]  = 0.375
13.5    = [8] + [4 + 0 + 1 + 0.5]           ->[1] + [0.5]       = 1.5
56.5    = [32] + [16 + 8 + 0 + 0 + 0 + 0.5] ->[0.5]             = 0.5

We can see the output in all cases is a new number, with no original exponent and the mantissa shifted an amount ( that is based on the exponent and the first non-zero bits of the mantissa after the first exponent-bits of the mantissa is ignored ) into the exponent. 我们可以看到在所有情况下的输出都是一个新的数字,没有原始指数,并且尾数移动了一个数量( 这是基于指数和尾数的第一个指数位之后的尾数的第一个非零位)忽略 )进入指数。 But I'm not really sure if this is the correct approach, it just works out nicely on paper. 但我不确定这是否是正确的方法,它只是在纸上很好地解决了。

Edit3: I'm stuck on Mono version 2.0.50727.1433 编辑3:我坚持使用Mono版本2.0.50727.1433

检查您的单声道版本,因为ConvertToIntConvertToIntTruncated 已于4年前添加,并且自2.10版本开始应该存在。

You can reduce the problem to taking a float mod 1. To simplify that, you can compute the floor of the float using bit operations, then use a floating point subtraction. 您可以减少浮点模式1的问题。为了简化这一点,您可以使用位操作计算浮点的底限,然后使用浮点减法。 The following is (unsafe) C# code for these operations: 以下是这些操作的(不安全)C#代码:

// domain is assumed to be positive
// returns value in [0,domain)
public float fmodulus(float val, float domain)
    if (val < 0)
        float negative = fmodulus(-val, domain);
        if (domain - negative == domain)
            return 0;
            return domain-negative;

    if (val < domain)
        return val; // this avoids losing accuracy

    return fmodOne(val / domain) * domain;

// assumes val >= 1, so val is positive and the exponent is at least 0 
unsafe public float fmodOne(float val)
    int iVal = *(int*)&val;
    int uncenteredExponent = iVal >> 23;
    int exponent = uncenteredExponent - 127; // 127 corresponds to 2^0 times the mantissa
    if (exponent >= 23) 
        return 0; // not enough precision to distinguish val from an integer

    int unneededBits = 23 - exponent; // between 0 and 23
    int iFloorVal = (iVal >> unneededBits) << unneededBits; // equivalent to using a mask to zero the bottom bits of the mantissa
    float floorVal = *(float*)&iFloorVal; // convert the bit pattern back to a float

    return val-floorVal;

For example, fmodulus(100.1f, 1) is 0.09999847. 例如,fmodulus(100.1f,1)是0.09999847。 The bit pattern of 100.1f is 100.1f的位模式是

0 10000101 10010000011001100110011 0 10000101 10010000011001100110011

The bit pattern of floorVal (100f) is floorVal(100f)的位模式是

0 10000101 10010000000000000000000 0 10000101 10010000000000000000000

A floating point subtraction gives something close to 0.1f: 浮点减法给出了接近0.1f的值:

0 01111011 10011001100110000000000 0 01111011 10011001100110000000000

Actually, I was surprised that the last 8 bits were zeroed out. 实际上,我很惊讶最后8位被清零了。 I thought only the last 6 bits of 0.1f were supposed to be replaced with 0. Perhaps one can do better than relying on the floating point subtraction. 我认为只有0.1f的最后6位应该被替换为0.也许一个人可以比依靠浮点减法更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM