Pandas DataFrame 和 numpy 标准差不同

Question

simply asking, why this std are different?只是问，为什么这个标准不同？

>>> import numpy
>>> import pandas as pd
>>>
>>> arr = [10, 386, 479, 627, 20, 523, 482, 483, 542, 699, 535, 617, 577, 471, 615, 583, 441, 562, 5
63, 527, 453, 530, 433, 541, 585, 704, 443, 569, 430, 637, 331, 511, 552, 496, 484, 566, 554, 472, 3
35, 440, 579, 341, 545, 615, 548, 604, 439, 556, 442, 461, 624, 611, 444, 578, 405, 487, 490, 496, 3
98, 512, 422, 455, 449, 432, 607, 679, 434, 597, 639, 565, 415, 486, 668, 414, 665, 763, 557, 304, 4
04, 454, 689, 610, 483, 441, 657, 590, 492, 476, 437, 483, 529, 363, 711, 543]
>>> elements = numpy.asarray(arr)
>>> arr_D = {"A":arr}
>>> df = pd.DataFrame(arr_D)
>>>
>>> print(numpy.std(elements, axis=0))
118.51857760182034
>>> print(numpy.std(df['A']))
118.5185776018204
>>> print(df['A'].std(axis=0))
119.15407050904474

Is it problem with my comprehension of topic?是不是我对题目的理解有问题？ As far as i know there pandas use numpy.据我所知，pandas 使用 numpy。 datafram std and numpy std of same column should be same.同一列的数据帧标准和 numpy 标准应该相同。

Is it a bug?这是一个错误吗？

Answer 1

Numpy uses biased std and pandas unbiased. Numpy 使用有偏的标准和 pandas 无偏。 In other words, numpy divides by n (number of elements) and pandas divides by n-1 .换句话说， numpy 除以n （元素数）， pandas 除以n-1 。 Try following to see that if matches:尝试以下以查看是否匹配：

print(df['A'].std(axis=0)/np.sqrt(len(arr))*np.sqrt((len(arr)-1)))
#118.51857760182033

Answer 2

pandas uses the Unbiased estimation by default and numpy does not by default, So neither of them are incorrect they use different approach to calculate std pandas默认使用无偏估计，而numpy默认不使用，所以它们都不正确，它们使用不同的方法来计算标准
To make numpy use Unbiased estimation pass ddof=1 to std要使numpy使用无偏估计传递ddof=1到std

>>> import numpy
>>> import pandas

>>> df = pandas.DataFrame(numpy.random.rand(100))

>>> numpy.std(df[0]) #default std biased estimation
0.2877601644414916

>>> numpy.std(df[0],ddof=1) #with ddof=1 i.e unbiased estimation
0.2892098469889083

>>> df[0].std() # unbiased estimation match with numpy std with ddof=1
0.2892098469889083

Pandas DataFrame 和 numpy 标准差不同

问题描述

2 个解决方案

解决方案1
2 2020-06-24 11:09:52

解决方案2
2 已采纳 2020-06-24 11:09:58

Pandas DataFrame 和 numpy 标准差不同

问题描述

2 个解决方案

解决方案1 2 2020-06-24 11:09:52

解决方案2 2 已采纳 2020-06-24 11:09:58

解决方案1
2 2020-06-24 11:09:52

解决方案2
2 已采纳 2020-06-24 11:09:58