![](/img/trans.png)
[英]Python large DataFrame - calculate standard deviation of expanding returns
[英]More efficient way to calculate standard deviation of a large list in Python
您好,我正在尝试计算一堆大约20,000个值的列表的标准偏差。 这是我的代码示例:
from statistics import stdev
def main():
a = [x for x in range(0,20000)]
b = []
for x in range(2, len(a) + 2):
b.append(stdev(a[:x]))
print(b)
main()
这种方法非常慢,我试图找到一种方法,使其更有效。 任何帮助表示赞赏。 谢谢。
[Done] exited with code=null in 820.376 seconds
看起来你想要一个扩展的标准偏差,我将使用pandas库和pandas.Series.expanding方法:
In [156]: main()[:5]
Out[156]:
[0.7071067811865476,
1.0,
1.2909944487358056,
1.5811388300841898,
1.8708286933869707]
In [157]: pd.Series(range(20000)).expanding().std()[:5]
Out[157]:
0 NaN
1 0.707107
2 1.000000
3 1.290994
4 1.581139
dtype: float64
您可以轻松切掉第一个元素并转换为列表,如果您需要:
In [158]: pd.Series(range(20000)).expanding().std()[1:6].tolist()
Out[158]:
[0.7071067811865476,
1.0,
1.2909944487358056,
1.5811388300841898,
1.8708286933869707]
虽然Series是一个比列表更有用的时间序列数据类型,但绝对性能更高:
In [159]: %timeit pd.Series(range(20000)).expanding().std()
1.07 ms ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
您可以跟踪平方值和值的总和:
from math import sqrt
a = range(0,20000)
def sdevs(a):
sds = [0]
n = 1
sum_x = a[0]
sum_x_squared = a[0]**2
for x in a[1:]:
sum_x += x
sum_x_squared += x**2
n += 1
# as noted by @Andrey Tyukin, statistics.stdev returns
# the unbiased estimator, hence the n/(n-1)
sd = sqrt(n/(n-1)*(sum_x_squared/n - (sum_x/n)**2))
sds.append(sd)
return sds
sds = sdevs(a)
print(sds[10000])
# 2887.184355042123
在10年前的PC上,这需要大约24毫秒。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.