[英]Multiplication of floating point numbers gives different results in Numpy and R
I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value. 我正在使用Python(Numpy)和R进行数据分析。我的数据是向量795067 X 3并计算此数据的均值,中位数,标准差和IQR会产生不同的结果,具体取决于我是使用Numpy还是R.我交叉检查值和它看起来像R给出“正确”的值。
Median:
Numpy:14.948499999999999
R: 14.9632
Mean:
Numpy: 13.097945407088607
R: 13.10936
Standard Deviation:
Numpy: 7.3927612774052083
R: 7.390328
IQR:
Numpy:12.358700000000002
R: 12.3468
Max and min of the data are the same on both platforms. 两个平台上的数据的最大值和最小值相同。 I ran a quick test to better understand what is going on here. 我进行了快速测试,以便更好地了解这里发生了什么。
In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? 在Numpy中,数字是float64数据类型,它们在R中是双倍的。这里发生了什么? Why are Numpy and R giving different results? 为什么Numpy和R会给出不同的结果? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. 我知道R使用IEEE754双精度,但我不知道Numpy使用的精度。 How can I change Numpy to give me the "correct" answer? 我怎样才能改变Numpy给我“正确”的答案?
The print
statement/function in Python will print single-precision floats. Python中的print
语句/函数将打印单精度浮点数。 Calculations will actually be done in the precision specified. 计算实际上将以指定的精度完成。 Python/numpy uses double-precision float by default (at least on my 64-bit machine): Python / numpy默认使用双精度浮点数(至少在我的64位机器上):
import numpy
single = numpy.float32(1.222) * numpy.float32(1.222)
double = numpy.float64(1.222) * numpy.float64(1.222)
pyfloat = 1.222 * 1.222
print single, double, pyfloat
# 1.49328 1.493284 1.493284
print "%.16f, %.16f, %.16f"%(single, double, pyfloat)
# 1.4932839870452881, 1.4932839999999998, 1.4932839999999998
In an interactive Python/iPython shell, the shell prints double-precision results when printing the results of statements: 在交互式Python / iPython shell中,shell在打印语句结果时打印双精度结果:
>>> 1.222 * 1.222
1.4932839999999998
In [1]: 1.222 * 1.222
Out[1]: 1.4932839999999998
It looks like R is doing the same as Python when using print
and sprintf
: 在使用print
和sprintf
时,看起来R和Python一样:
print(1.222 * 1.222)
# 1.493284
sprintf("%.16f", 1.222 * 1.222)
# "1.4932839999999998"
In contrast to interactive Python shells, the interactive R shell also prints single-precision when printing the results of statements: 与交互式Python shell相比,交互式R shell在打印语句结果时也会打印单精度:
> 1.222 * 1.222
[1] 1.493284
The differences in your results could result from using single-precision values in numpy. 在numpy中使用单精度值可能会导致结果的差异。 Calculations with a lot of additions/subtractions will ultimately make the problem surface: 具有大量加法/减法的计算最终会使问题浮出水面:
In [1]: import numpy
In [2]: a = numpy.float32(1.222)
In [3]: a*6
Out[3]: 7.3320000171661377
In [4]: a+a+a+a+a+a
Out[4]: 7.3320003
As suggested in the comments to your actual question, make sure to use double-precision floats in your numpy calculations. 正如您对实际问题的评论中所建议的那样,请确保在numpy计算中使用双精度浮点数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.