简体   繁体   English

浮点数的乘法在Numpy和R中给出不同的结果

[英]Multiplication of floating point numbers gives different results in Numpy and R

I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value. 我正在使用Python(Numpy)和R进行数据分析。我的数据是向量795067 X 3并计算此数据的均值,中位数,标准差和IQR会产生不同的结果,具体取决于我是使用Numpy还是R.我交叉检查值和它看起来像R给出“正确”的值。

Median: 
Numpy:14.948499999999999
R: 14.9632

Mean: 
Numpy: 13.097945407088607
R: 13.10936

Standard Deviation: 
Numpy: 7.3927612774052083
R: 7.390328

IQR: 
Numpy:12.358700000000002
R: 12.3468

Max and min of the data are the same on both platforms. 两个平台上的数据的最大值和最小值相同。 I ran a quick test to better understand what is going on here. 我进行了快速测试,以便更好地了解这里发生了什么。

  • Multiplying 1.2*1.2 in Numpy gives 1.4 (same with R). 在Numpy中乘以1.2 * 1.2给出1.4(与R相同)。
  • Multiplying 1.22*1.22 gives 1.4884 in Numpy and the same with R. 乘以1.22 * 1.22在Numpy中给出1.4884并且与R相同
  • However, multiplying 1.222*1.222 in Numpy gives 1.4932839999999998 which is clearly wrong! 但是,在Numpy中乘以1.222 * 1.222会给出1.4932839999999998,这显然是错误的! Doing the multiplication in R gives the correct answer of 1.49324. 在R中进行乘法得到1.49324的正确答案。
  • Multiplying 1.2222*1.2222 in Numpy gives 1.4937728399999999 and 1.493773 in R. Once more, R is correct. 在Numpy中乘以1.2222 * 1.2222得到1.4937728399999999和1.493773在R.再一次,R是正确的。

In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? 在Numpy中,数字是float64数据类型,它们在R中是双倍的。这里发生了什么? Why are Numpy and R giving different results? 为什么Numpy和R会给出不同的结果? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. 我知道R使用IEEE754双精度,但我不知道Numpy使用的精度。 How can I change Numpy to give me the "correct" answer? 我怎样才能改变Numpy给我“正确”的答案?

Python 蟒蛇

The print statement/function in Python will print single-precision floats. Python中的print语句/函数将打印单精度浮点数。 Calculations will actually be done in the precision specified. 计算实际上将以指定的精度完成。 Python/numpy uses double-precision float by default (at least on my 64-bit machine): Python / numpy默认使用双精度浮点数(至少在我的64位机器上):

import numpy

single = numpy.float32(1.222) * numpy.float32(1.222)
double = numpy.float64(1.222) * numpy.float64(1.222)
pyfloat = 1.222 * 1.222

print single, double, pyfloat
# 1.49328 1.493284 1.493284

print "%.16f, %.16f, %.16f"%(single, double, pyfloat)
# 1.4932839870452881, 1.4932839999999998, 1.4932839999999998

In an interactive Python/iPython shell, the shell prints double-precision results when printing the results of statements: 在交互式Python / iPython shell中,shell在打印语句结果时打印双精度结果:

>>> 1.222 * 1.222
1.4932839999999998

In [1]: 1.222 * 1.222
Out[1]: 1.4932839999999998

R [R

It looks like R is doing the same as Python when using print and sprintf : 在使用printsprintf时,看起来R和Python一样:

print(1.222 * 1.222)
# 1.493284

sprintf("%.16f", 1.222 * 1.222)
# "1.4932839999999998"

In contrast to interactive Python shells, the interactive R shell also prints single-precision when printing the results of statements: 与交互式Python shell相比,交互式R shell在打印语句结果时也会打印单精度:

> 1.222 * 1.222
[1] 1.493284

Differences between Python and R Python和R之间的差异

The differences in your results could result from using single-precision values in numpy. 在numpy中使用单精度值可能会导致结果的差异。 Calculations with a lot of additions/subtractions will ultimately make the problem surface: 具有大量加法/减法的计算最终会使问题浮出水面:

In [1]: import numpy

In [2]: a = numpy.float32(1.222)

In [3]: a*6
Out[3]: 7.3320000171661377

In [4]: a+a+a+a+a+a
Out[4]: 7.3320003

As suggested in the comments to your actual question, make sure to use double-precision floats in your numpy calculations. 正如您对实际问题的评论中所建议的那样,请确保在numpy计算中使用双精度浮点数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM