[英]map range of IEEE 32bit float [1:2) to some arbitrary [a:b)
I've got a fast uniform pseudo random number generator that creates uniform float32 numbers in range [1:2) ie u : 1 <= u <= 2-eps
.我有一个快速的统一伪随机数生成器,可以在 [1:2) 范围内创建统一的 float32 数字,即u : 1 <= u <= 2-eps
。 Unfortunately mapping the endpoints [1:2) to that of an arbitrary range [a:b) is non-trivial in floating point math.不幸的是,将端点 [1:2) 映射到任意范围 [a:b) 的端点在浮点数学中并非易事。 I'd like to exactly match the endpoints with a simple affine calculation.我想通过简单的仿射计算来精确匹配端点。
I want to make an IEEE-754 32 bit floating point affine function f(x,a,b)
for 1<=x<2
and arbitrary a,b that exactly maps 1 -> a
and nextlower(2) -> nextlower(b)
我想为1<=x<2
和任意 a,b 制作一个 IEEE-754 32 位浮点仿射函数f(x,a,b)
精确映射1 -> a
和nextlower(2) -> nextlower(b)
where nextlower(q)
is the next lower FP representable number (eg in C++ std::nextafter(float(q),float(q-1))
)其中nextlower(q)
是下一个较低的 FP 可表示数(例如在 C++ std::nextafter(float(q),float(q-1))
)
The simple mapping f(x,a,b) = (x-1)*(ba) + a
always achieves the f(1) condition but sometimes fails the f(2) condition due to floating point rounding.简单映射f(x,a,b) = (x-1)*(ba) + a
总能达到 f(1) 条件,但有时由于浮点舍入而不能达到 f(2) 条件。
I've tried replacing the 1
with a free design parameter to cancel FP errors in the spirit of Kahan summation .本着Kahan summation的精神,我尝试用自由设计参数替换1
以取消 FP 错误。 ie with f(x,c0,c1,c2) = (x-c0)*c1 + c2
one mathematical solution is c0=1,c1=(ba),c2=a
(the simple mapping above), but the extra parameter lets me play around with constants c0,c1,c2
to match the endpoints.即f(x,c0,c1,c2) = (x-c0)*c1 + c2
一个数学解决方案是c0=1,c1=(ba),c2=a
(上面的简单映射),但额外的参数让我使用常量c0,c1,c2
来匹配端点。 I'm not sure I understand the principles behind Kahan summation well enough to apply them to determine the parameters or even be confident a solution exists.我不确定我是否充分理解 Kahan 求和背后的原则,以将它们应用于确定参数,甚至确信存在解决方案。 It feels like I'm bumping around in the dark where others might've found the light already.感觉就像我在黑暗中颠簸,其他人可能已经找到了光。
Aside: I'm fine assuming the following旁白:假设以下情况我很好
Simple lerping based on fused multiply-add can reliably hit the endpoints for interpolation factors 0 and 1. For x
in [1, 2) the interpolation factor x - 1
does not reach unity, which can be fixed by slight stretching by multiplying x-1
with (2.0f / nextlower(2.0f))
.基于乘加简单lerping能够可靠地命中为内插因子0和1的终点对于x
在[1,2)的内插因子x - 1
没有达到统一,这可通过轻微地固定乘以拉伸x-1
与(2.0f / nextlower(2.0f))
。 Obviously the endpoint needs to also be adjusted to the endpoint nextlower(b)
.显然端点也需要调整为端点nextlower(b)
。 For the C code below I have used the definition of nextlower()
provided in the question, which may not be what asker desires, since for floating-point q
sufficiently large in magnitude, q == (q - 1)
.对于下面的 C 代码,我使用了问题中提供的nextlower()
的定义,这可能不是提问者想要的,因为对于幅度足够大的浮点q
, q == (q - 1)
。
Asker stated in comments that it is understood that this kind of mapping is not going to result in an exactly uniform distribution of the pseudo-random numbers in the interval [a, b), only approximately so, and that pathological mappings may occur when a and b are extremely close together. Asker 在评论中表示,据了解,这种映射不会导致区间 [a, b) 中伪随机数的完全均匀分布,只是大致如此,并且病理映射可能发生在和 b 非常接近。 I have not mathematically proved that the implementation of map()
below guarantees the desired behavior, but it seems to do so for a large number of random test cases.我还没有从数学上证明下面的map()
的实现保证了所需的行为,但对于大量随机测试用例来说似乎是这样做的。
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
float nextlowerf (float q)
{
return nextafterf (q, q - 1);
}
float map (float a, float b, float x)
{
float t = (x - 1.0f) * (2.0f / nextlowerf (2.0f));
return fmaf (t, nextlowerf (b), fmaf (-t, a, a));
}
float uint32_as_float (uint32_t a)
{
float r;
memcpy (&r, &a, sizeof(r));
return r;
}
// George Marsaglia's KISS PRNG, period 2**123. Newsgroup sci.math, 21 Jan 1999
// Bug fix: Greg Rose, "KISS: A Bit Too Simple" http://eprint.iacr.org/2011/007
static uint32_t kiss_z=362436069, kiss_w=521288629;
static uint32_t kiss_jsr=123456789, kiss_jcong=380116160;
#define znew (kiss_z=36969*(kiss_z&65535)+(kiss_z>>16))
#define wnew (kiss_w=18000*(kiss_w&65535)+(kiss_w>>16))
#define MWC ((znew<<16)+wnew )
#define SHR3 (kiss_jsr^=(kiss_jsr<<13),kiss_jsr^=(kiss_jsr>>17), \
kiss_jsr^=(kiss_jsr<<5))
#define CONG (kiss_jcong=69069*kiss_jcong+1234567)
#define KISS ((MWC^CONG)+SHR3)
int main (void)
{
float a, b, x, r;
float FP32_MIN_NORM = 0x1.000000p-126f;
float FP32_MAX_NORM = 0x1.fffffep+127f;
do {
do {
a = uint32_as_float (KISS);
} while ((fabsf (a) < FP32_MIN_NORM) || (fabsf (a) > FP32_MAX_NORM) || isnan (a));
do {
b = uint32_as_float (KISS);
} while ((fabsf (b) < FP32_MIN_NORM) || (fabsf (b) > FP32_MAX_NORM) || isnan (b) || (b < a));
x = 1.0f;
r = map (a, b, x);
if (r != a) {
printf ("lower bound failed: a=%12.6a b=%12.6a map=%12.6a\n", a, b, r);
return EXIT_FAILURE;
}
x = nextlowerf (2.0f);
r = map (a, b, x);
if (r != nextlowerf (b)) {
printf ("upper bound failed: a=%12.6a b=%12.6a map=%12.6a\n", a, b, r);
return EXIT_FAILURE;
}
} while (1);
return EXIT_SUCCESS;
}
OP's goal OP的目标
I want to make an IEEE-754 32 bit floating point affine function f(x,a,b) for 1<=x<2 and arbitrary a,b that exactly maps 1 -> a and nextlower(2) -> nextlower(b)我想为 1<=x<2 和任意 a,b 制作一个 IEEE-754 32 位浮点仿射函数 f(x,a,b) 精确映射 1 -> a 和 nextlower(2) -> nextlower( b)
This differs slightly from "map range of IEEE 32bit float [1:2) to some arbitrary [a:b)".这与“将 IEEE 32 位浮点数 [1:2) 映射到任意 [a:b) 的范围”略有不同。
General case一般情况
Map x0
to y0
, x1
to y1
and various x
in-between to y
:将x0
映射到y0
,将x1
映射到y1
并将其间的各种x
映射到y
:
m = (y1 - y0)/(x1 - x0);
y = m*(x - x0) + y0;
OP's case OP的案例
// x0 = 1.0f;
// x1 = nextafterf(2.0f, 1.0f);
// y0 = a;
// y1 = nextafterf(b, a);
#include <math.h> // for nextafterf()
float x = random_number_1_to_almost_2();
float m = (nextafterf(b, a) - a)/(nextafterf(2.0f, 1.0f) - 1.0f);
float y = m*(x - 1.0f) + a;
nextafterf(2.0f, 1.0f) - 1.0f
, x - 1.0f
and nextafterf(b, a)
are exact, incurring no calculation error. nextafterf(2.0f, 1.0f) - 1.0f
, x - 1.0f
和nextafterf(b, a)
是精确的,不会产生计算错误。
nextafterf(2.0f, 1.0f) - 1.0f
is a value a little less than 1.0f. nextafterf(2.0f, 1.0f) - 1.0f
是一个略小于 1.0f 的值。
Recommendation推荐
Other re-formations are possible with better symmetry and numerical stability at the end-points.在端点处具有更好的对称性和数值稳定性的其他重构是可能的。
float x = random_number_1_to_almost_2();
float afactor = nextafterf(2.0f, 1.0f) - x; // exact
float bfactor = x - 1.0f; // exact
float xwidth = nextafterf(2.0f, 1.0f) - 1.0f; // exact
// Do not re-order next line of code, perform 2 divisions
float y = (afactor/xwidth)*a + (bfactor/xwidth)*nextafterf(b, a);
Notice afactor/xwidth
and bfactor/xwidth
are both exactly 0.0 or 1.0 at the end-points, thus meeting "maps 1 -> a and nextlower(2) -> nextlower(b)".请注意, afactor/xwidth
和bfactor/xwidth
在bfactor/xwidth
处都恰好是 0.0 或 1.0,因此满足“映射 1 -> a 和 nextlower(2) -> nextlower(b)”。 Extended precision not needed.不需要扩展精度。
OP's (x-c0)*c1 + c2
has trouble as it divides (x-c0)*c1
by (2.0 - 1.0) or 1.0 (implied), when it should divide by nextafterf(2.0f, 1.0f) - 1.0f
. OP 的(x-c0)*c1 + c2
有问题,因为它将(x-c0)*c1
除以 (2.0 - 1.0) 或 1.0(隐含),当它应该除以nextafterf(2.0f, 1.0f) - 1.0f
.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.