简体   繁体   English

来自Wolfram和numpy的相同输入的标准偏差不同

[英]Different standard deviation for same input from Wolfram and numpy

I am currently working on reimplementing some algorithm written in Java in Python. 我目前正在重新实现一些用Python编写的算法。 One step is to calculate the standard deviation of a list of values. 一步是计算值列表的标准偏差。 The original implementation uses DescriptiveStatistics.getStandardDeviation from the Apache Math 1.1 library for this. 最初的实现使用了Apache Math 1.1库中的DescriptiveStatistics.getStandardDeviation I use the standard deviation of numpy 1.5. 我使用numpy 1.5的标准偏差。 The problem is, they give (very) different results for the same input. 问题是,它们为相同的输入提供(非常)不同的结果。 The sample I have is this: 我的样本是这样的:

[0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]

I get the following results: 我得到以下结果:

numpy           : 0.10932134388775223
Apache Math 1.1 : 0.12620366805397404
Wolfram Alpha   : 0.12620366805397404

I checked with Wolfram Alpha to get a third opinion. 我与Wolfram Alpha核实了第三个意见。 I do not think that such a difference can be explained by precision alone. 我不认为这种差异可以仅通过精确来解释。 Does anyone have any idea why this is happening, and what I could do about it? 有谁知道为什么会这样,我能做些什么呢?

Edit : Calculating it manually in Python gives the same result: 编辑 :在Python中手动计算它会得到相同的结果:

>>> from math import sqrt
>>> v = [0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]
>>> mu = sum(v) / 4
>>> sqrt(sum([(x - mu)**2 for x in v]) / 4)
0.10932134388775223

Also, about not using it right: 另外,关于不正确使用它:

>>> from numpy import std
>>> std([0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842])
0.10932134388775223

Apache and Wolfram divide by N-1 rather than N. This is a degrees of freedom adjustment, since you estimate μ. Apache和Wolfram除以N-1而不是N.这是一个自由度调整,因为你估计μ。 By dividing by N-1 you obtain an unbiased estimate of the population standard deviation. 除以N-1,您可以获得人口标准差的无偏估计。 You can change NumPy's behavior using the ddof option. 您可以使用ddof选项更改NumPy的行为。

This is described in the NumPy documentation: 这在NumPy文档中描述:

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). 平均偏差通常计算为x.sum()/ N,其中N = len(x)。 If, however, ddof is specified, the divisor N - ddof is used instead. 但是,如果指定了ddof,则使用除数N - ddof。 In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. 在标准统计实践中,ddof = 1提供了无穷大群体方差的无偏估计。 ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. ddof = 0提供正态分布变量的方差的最大似然估计。 The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se. 在此函数中计算的标准差是估计方差的平方根,因此即使ddof = 1,它也不会是标准偏差本身的无偏估计。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM