来自Wolfram和numpy的相同输入的标准偏差不同

Question

I am currently working on reimplementing some algorithm written in Java in Python. 我目前正在重新实现一些用Python编写的算法。 One step is to calculate the standard deviation of a list of values. 一步是计算值列表的标准偏差。 The original implementation uses DescriptiveStatistics.getStandardDeviation from the Apache Math 1.1 library for this. 最初的实现使用了Apache Math 1.1库中的DescriptiveStatistics.getStandardDeviation 。 I use the standard deviation of numpy 1.5. 我使用numpy 1.5的标准偏差。 The problem is, they give (very) different results for the same input. 问题是，它们为相同的输入提供（非常）不同的结果。 The sample I have is this: 我的样本是这样的：

[0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]

I get the following results: 我得到以下结果：

numpy           : 0.10932134388775223
Apache Math 1.1 : 0.12620366805397404
Wolfram Alpha   : 0.12620366805397404

I checked with Wolfram Alpha to get a third opinion. 我与Wolfram Alpha核实了第三个意见。 I do not think that such a difference can be explained by precision alone. 我不认为这种差异可以仅通过精确来解释。 Does anyone have any idea why this is happening, and what I could do about it? 有谁知道为什么会这样，我能做些什么呢？

Edit : Calculating it manually in Python gives the same result: 编辑：在Python中手动计算它会得到相同的结果：

>>> from math import sqrt
>>> v = [0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]
>>> mu = sum(v) / 4
>>> sqrt(sum([(x - mu)**2 for x in v]) / 4)
0.10932134388775223

Also, about not using it right: 另外，关于不正确使用它：

>>> from numpy import std
>>> std([0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842])
0.10932134388775223

Answer 1

Apache and Wolfram divide by N-1 rather than N. This is a degrees of freedom adjustment, since you estimate μ. Apache和Wolfram除以N-1而不是N.这是一个自由度调整，因为你估计μ。 By dividing by N-1 you obtain an unbiased estimate of the population standard deviation. 除以N-1，您可以获得人口标准差的无偏估计。 You can change NumPy's behavior using the ddof option. 您可以使用ddof选项更改NumPy的行为。

This is described in the NumPy documentation: 这在NumPy文档中描述：

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). 平均偏差通常计算为x.sum（）/ N，其中N = len（x）。 If, however, ddof is specified, the divisor N - ddof is used instead. 但是，如果指定了ddof，则使用除数N - ddof。 In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. 在标准统计实践中，ddof = 1提供了无穷大群体方差的无偏估计。 ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. ddof = 0提供正态分布变量的方差的最大似然估计。 The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se. 在此函数中计算的标准差是估计方差的平方根，因此即使ddof = 1，它也不会是标准偏差本身的无偏估计。

来自Wolfram和numpy的相同输入的标准偏差不同

问题描述

1 个解决方案

解决方案1
23 已采纳 2011-01-01 20:39:47

来自Wolfram和numpy的相同输入的标准偏差不同

问题描述

1 个解决方案

解决方案1 23 已采纳 2011-01-01 20:39:47

解决方案1
23 已采纳 2011-01-01 20:39:47