Python：Numpy标准差错误

Question

这是一个简单的测试

import numpy as np
data = np.array([-1,0,1])
print data.std()

>> 0.816496580928

我不明白这个结果是如何产生的？ 明显：

( (1^0.5 + 1^0.5 + 0^0.5)/(3-1) )^0.5 = 1

在matlab中它给了我std([-1,0,1]) = 1 。 你能帮我理解numpy.std()是如何工作的吗？

Answer 1

这个问题的症结在于你需要除以N （3），而不是N-1 （2）。 正如Iarsmans指出的那样，numpy将使用总体方差，而不是样本方差。

所以真正的答案是sqrt(2/3) ，这正是： 0.8164965...

如果您碰巧尝试故意使用不同的值（默认值为0）作为自由度，请使用关键字参数ddof ，其值为0以外的正值：

np.std(data, ddof=1)

......但这样做会在这里再介绍你原来的问题，numpy的将被划分N - ddof 。

Answer 2

在建议它不正确之前，有必要阅读函数/方法的帮助页面。 该方法完全按照文档字符串的说法进行操作，除以3，因为默认情况下ddof为零。 ：

In [3]: numpy.std?

String form: <function std at 0x104222398>
File:        /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py
Definition:  numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)
Docstring:
Compute the standard deviation along the specified axis.

...

ddof : int, optional
    Means Delta Degrees of Freedom.  The divisor used in calculations
    is ``N - ddof``, where ``N`` represents the number of elements.
    By default `ddof` is zero.

Answer 3

当从Matlab进入NumPy时，您可能希望保留两个方便的文档。 它们很相似，但通常在小而重要的细节上有所不同。 基本上，他们以不同方式计算标准差。 我强烈建议您查看文档，了解您使用的任何计算标准差的信息，无论是袖珍计算器还是编程语言，因为默认设置不是（抱歉！）标准化。

Numpy STD： http ： //docs.scipy.org/doc/numpy/reference/generated/numpy.std.html

Matlab STD： http ： //www.mathworks.com/help/matlab/ref/std.html

对于std的Numpy文档有点不透明，恕我直言，特别是考虑到NumPy文档通常相当清楚。 如果你读得足够远： The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. （英文版，默认为pop std dev，为样本std dev设置ddof=1 ）。

OTOH，Matlab文档清楚地说明了绊倒你的区别：

There are two common textbook definitions for the standard deviation s of a data vector X. [equations omitted] n is the number of elements in the sample. The two forms of the equation differ only in n – 1 versus n in the divisor.

因此，默认情况下，Matlab计算样本标准偏差（除数中的N-1，因此更大以补偿这是一个样本）并且Numpy计算总体标准偏差（除数中的N）。 您可以使用ddof参数切换到示例标准或您想要的任何其他分母（这超出了我的统计知识）。

最后，它对这个问题没有帮助，但你可能会发现这在某些方面很有帮助。 http://wiki.scipy.org/NumPy_for_Matlab_Users

Python：Numpy标准差错误

问题描述

3 个解决方案

解决方案1
20 2014-06-05 18:54:34

解决方案2
4 2014-06-05 19:00:33

解决方案3
1 2014-06-05 19:11:15

Python：Numpy标准差错误

问题描述

3 个解决方案

解决方案1 20 2014-06-05 18:54:34

解决方案2 4 2014-06-05 19:00:33

解决方案3 1 2014-06-05 19:11:15

解决方案1
20 2014-06-05 18:54:34

解决方案2
4 2014-06-05 19:00:33

解决方案3
1 2014-06-05 19:11:15