简体   繁体   English

将范围分成均匀间隔

[英]Split range into uniform intervals

I want to split a range with double borders into N>=2 equal or near equal intervals. 我想将具有double边框的范围分割为N>=2等于或接近相等的间隔。

I found a suitable function in GNU Scientific Library : 我在GNU Scientific Library中找到了一个合适的函数:

make_uniform (double range[], size_t n, double xmin, double xmax)
{
  size_t i;

  for (i = 0; i <= n; i++)
    {
      double f1 = ((double) (n-i) / (double) n);
      double f2 = ((double) i / (double) n);
      range[i] = f1 * xmin +  f2 * xmax;
    }
}

However, when 但是,什么时候
xmin = 241141 (binary 0x410D6FA800000000 ) xmin = 241141 (二进制0x410D6FA800000000
xmax = 241141.0000000001 (binary 0x410D6FA800000003 ) xmax = 241141.0000000001 (二进制0x410D6FA800000003
N = 3
the function produces 功能产生

[0x410D6FA800000000,
 0x410D6FA800000000,
 0x410D6FA800000002,
 0x410D6FA800000003]

instead of desired 而不是期望

[0x410D6FA800000000,
 0x410D6FA800000001,
 0x410D6FA800000002,
 0x410D6FA800000003]

How achieve uniformity without resorting to long arithmetics (i already have a long arithmetics solution but it is ugly and slow)? 如何在不使用长算术的情况下实现均匀性(我已经有一个很长的算术解决方案,但它很丑陋而且很慢)? Bit twiddling and x86 (x86-64, so no extended precision) assembler routines are acceptable. bit twiddling和x86(x86-64,因此没有扩展精度)汇编程序例程是可以接受的。

UPDATE: 更新:

General solution is needed, without premise that xmin , xmax have equal exponent and sign: 需要一般的解决方案,没有前提是xminxmax具有相等的指数和符号:

  • xmin and xmax may be of any value except infinity and NaN (possibly also excluding denormalized values for sake of simplicity). xminxmax可以是除无穷大和NaN之外的任何值(为简单起见,可能还排除非规范化值)。
  • xmin < xmax
  • (1<<11)-1>=N>=2
  • i'm ready for major (in 2-3 orders) performance loss 我准备好主要(2-3个订单)的性能损失

I see two choices: reordering the operations as xmin + (i * (xmax - xmin)) / n , or dealing directly with the binary representations. 我看到两个选择:将操作重新排序为xmin + (i * (xmax - xmin)) / n ,或直接处理二进制表示。 Here is a example for both. 这是两个例子。

#include <iostream>
#include <iomanip>

int main() {
    double xmin = 241141;
    double xmax = 241141.0000000001;
    size_t n = 3, i;
    double range[4];

    std::cout << std::setprecision(std::numeric_limits<double>::digits10) << std::fixed;

    for (i = 0; i <= n; i++) {
        range[i] = xmin + (i * (xmax - xmin)) / n;

        std::cout << range[i] << "\n";
    }
    std::cout << "\n";

    auto uxmin = reinterpret_cast<unsigned long long&>(xmin);
    auto uxmax = reinterpret_cast<unsigned long long&>(xmax);

    for (i = 0; i <= n; i++) {
        auto rangei = ((n-i) * uxmin + i * uxmax) / n;
        range[i] = reinterpret_cast<double&>(rangei);

        std::cout << range[i] << "\n";
    }
}

Live on Coliru 住在Coliru

x87 still exists in x86-64, and 64-bit kernels for mainstream OSes do correctly save/restore the x87 state for 64-bit processes. x87仍然存在于x86-64中,主流操作系统的64位内核可以正确保存/恢复64位进程的x87状态。 Despite what you may have read, x87 is fully usable in 64-bit code. 尽管您可能已阅读过,但x87完全可用于64位代码。

Outside of Windows (ie the x86-64 System V ABI used everywhere else), long double is the 80-bit native x87 native format. 在Windows之外(即x86-64 System V ABI在其他地方使用), long double是80位原生x87原生格式。 This will probably solve your precision problem for x86 / x86-64 only, if you don't care about portability to ARM / PowerPC / whatever else that only has 64-bit precision in HW. 如果您不关心ARM / PowerPC的可移植性/在HW中只有64位精度的其他任何东西,这可能只解决x86 / x86-64的精度问题。

Probably best to only use long double for temporaries inside the function. 可能最好只使用long double的功能内的临时工具。

I'm not sure what you have to do on Windows to get a compiler to emit 80-bit extended FP math. 我不确定你要在Windows上做什么来让编译器发出80位扩展FP数学。 It's certainly possible in asm, and supported by the kernel, but the toolchain and ABI make it inconvenient to use. 它在asm中肯定是可能的,并且由内核支持,但是工具链和ABI使得使用起来不方便。


x87 is only somewhat slower than scalar SSE math on current CPUs. x87仅比当前CPU上的标量SSE数学慢一些。 80-bit load/store is extra slow, though, like 4 uops on Skylake instead of 1 ( https://agner.org/optimize/ ) and a few cycles extra latency for fld m80 . 但是,80位加载/存储速度非常慢,例如Skylake上的4 uops而不是1( https://agner.org/optimize/ ),以及fld m80的几个周期额外延迟。

For your loop having to convert int to FP by storing and using x87 fild , it might be something like at most a factor of 2 slower than what a good compiler could do with SSE2 for 64-bit double. 对于你的循环必须通过存储和使用x87 fild将int转换为FP,它可能最多比一个好的编译器可以用SSE2为64位double的速度慢2倍。

And of course long double will prevent auto-vectorization. 当然, long double会阻止自动矢量化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM