简体   繁体   English

Pandas DataFrame 和 numpy 标准差不同

[英]Pandas DataFrame and numpy standard deviation are different

simply asking, why this std are different?只是问,为什么这个标准不同?

>>> import numpy
>>> import pandas as pd
>>>
>>> arr = [10, 386, 479, 627, 20, 523, 482, 483, 542, 699, 535, 617, 577, 471, 615, 583, 441, 562, 5
63, 527, 453, 530, 433, 541, 585, 704, 443, 569, 430, 637, 331, 511, 552, 496, 484, 566, 554, 472, 3
35, 440, 579, 341, 545, 615, 548, 604, 439, 556, 442, 461, 624, 611, 444, 578, 405, 487, 490, 496, 3
98, 512, 422, 455, 449, 432, 607, 679, 434, 597, 639, 565, 415, 486, 668, 414, 665, 763, 557, 304, 4
04, 454, 689, 610, 483, 441, 657, 590, 492, 476, 437, 483, 529, 363, 711, 543]
>>> elements = numpy.asarray(arr)
>>> arr_D = {"A":arr}
>>> df = pd.DataFrame(arr_D)
>>>
>>> print(numpy.std(elements, axis=0))
118.51857760182034
>>> print(numpy.std(df['A']))
118.5185776018204
>>> print(df['A'].std(axis=0))
119.15407050904474

Is it problem with my comprehension of topic?是不是我对题目的理解有问题? As far as i know there pandas use numpy.据我所知,pandas 使用 numpy。 datafram std and numpy std of same column should be same.同一列的数据帧标准和 numpy 标准应该相同。

Is it a bug?这是一个错误吗?

Numpy uses biased std and pandas unbiased. Numpy 使用有偏的标准和 pandas 无偏。 In other words, numpy divides by n (number of elements) and pandas divides by n-1 .换句话说, numpy 除以n (元素数), pandas 除以n-1 Try following to see that if matches:尝试以下以查看是否匹配:

print(df['A'].std(axis=0)/np.sqrt(len(arr))*np.sqrt((len(arr)-1)))
#118.51857760182033

pandas uses the Unbiased estimation by default and numpy does not by default, So neither of them are incorrect they use different approach to calculate std pandas默认使用无偏估计,而numpy默认不使用,所以它们都不正确,它们使用不同的方法来计算标准
To make numpy use Unbiased estimation pass ddof=1 to std要使numpy使用无偏估计传递ddof=1std

>>> import numpy
>>> import pandas

>>> df = pandas.DataFrame(numpy.random.rand(100))

>>> numpy.std(df[0]) #default std biased estimation
0.2877601644414916

>>> numpy.std(df[0],ddof=1) #with ddof=1 i.e unbiased estimation
0.2892098469889083

>>> df[0].std() # unbiased estimation match with numpy std with ddof=1
0.2892098469889083


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM