[英]How to calculate moving average in Python 3?
Let's say I have a list: 假设我有一个清单:
y = ['1', '2', '3', '4','5','6','7','8','9','10']
I want to create a function that calculates the moving n-day average. 我想创建一个计算移动n天平均值的函数。 So if
n
was 5, I would want my code to calculate the first 1-5, add it and find the average, which would be 3.0, then go on to 2-6, calculate the average, which would be 4.0, then 3-7, 4-8, 5-9, 6-10. 因此,如果
n
是5,我希望我的代码计算前1-5,添加它并找到平均值,这将是3.0,然后继续到2-6,计算平均值,这将是4.0,然后3 -7,4-8,5-9,6-10。
I don't want to calculate the first n-1 days, so starting from the nth day, it'll count the previous days. 我不想计算前n-1天,所以从第n天开始,它将计算前几天。
def moving_average(x:'list of prices', n):
for num in range(len(x)+1):
print(x[num-n:num])
This seems to print out what I want: 这似乎打印出我想要的东西:
[]
[]
[]
[]
[]
['1', '2', '3', '4', '5']
['2', '3', '4', '5', '6']
['3', '4', '5', '6', '7']
['4', '5', '6', '7', '8']
['5', '6', '7', '8', '9']
['6', '7', '8', '9', '10']
However, I don't know how to calculate the numbers inside those lists. 但是,我不知道如何计算这些列表中的数字。 Any ideas?
有任何想法吗?
There is a great sliding window generator in an old version of the Python docs with itertools
examples : 旧版本的Python文档中有一个很棒的滑动窗口生成器,带有
itertools
示例 :
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
Using that your moving averages is trivial: 使用你的移动平均线是微不足道的:
from __future__ import division # For Python 2
def moving_averages(values, size):
for selection in window(values, size):
yield sum(selection) / size
Running this against your input (mapping the strings to integers) gives: 针对您的输入运行此命令(将字符串映射到整数)给出:
>>> y= ['1', '2', '3', '4','5','6','7','8','9','10']
>>> for avg in moving_averages(map(int, y), 5):
... print(avg)
...
3.0
4.0
5.0
6.0
7.0
8.0
To return None
the first n - 1
iterations for 'incomplete' sets, just expand the moving_averages
function a little: 要返回“
None
完成”集合的前n - 1
次迭代,只需稍微扩展moving_averages
函数:
def moving_averages(values, size):
for _ in range(size - 1):
yield None
for selection in window(values, size):
yield sum(selection) / size
While I like Martijn's answer on this, like george, I was wondering if this wouldn't be faster by using a running summation instead of applying the sum()
over and over again on mostly the same numbers. 虽然我喜欢Martijn对此的回答 ,就像乔治一样,我想知道如果使用运行求和而不是在大多数相同的数字上反复应用
sum()
,这是否会更快。
Also the idea of having None
values as default during the ramp up phase is interesting. 此外,在加速阶段将
None
值设为默认值的想法很有意思。 In fact there may be plenty of different scenarios one could conceive for moving averages. 实际上,可能存在许多可以设想移动平均线的不同场景。 Let's split the calculation of averages into three phases:
我们将平均值的计算分为三个阶段:
average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
的元素average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
window_size - 1
"average" numbers. window_size - 1
“平均”数字。 Here's a function that accepts 这是一个接受的功能
None
) or to provide partial averages None
)或提供部分平均值 Here's the code: 这是代码:
from collections import deque
def moving_averages(data, size, rampUp=True, rampDown=True):
"""Slide a window of <size> elements over <data> to calc an average
First and last <size-1> iterations when window is not yet completely
filled with data, or the window empties due to exhausted <data>, the
average is computed with just the available data (but still divided
by <size>).
Set rampUp/rampDown to False in order to not provide any values during
those start and end <size-1> iterations.
Set rampUp/rampDown to functions to provide arbitrary partial average
numbers during those phases. The callback will get the currently
available input data in a deque. Do not modify that data.
"""
d = deque()
running_sum = 0.0
data = iter(data)
# rampUp
for count in range(1, size):
try:
val = next(data)
except StopIteration:
break
running_sum += val
d.append(val)
#print("up: running sum:" + str(running_sum) + " count: " + str(count) + " deque: " + str(d))
if rampUp:
if callable(rampUp):
yield rampUp(d)
else:
yield running_sum / size
# steady
exhausted_early = True
for val in data:
exhausted_early = False
running_sum += val
#print("st: running sum:" + str(running_sum) + " deque: " + str(d))
yield running_sum / size
d.append(val)
running_sum -= d.popleft()
# rampDown
if rampDown:
if exhausted_early:
running_sum -= d.popleft()
for (count) in range(min(len(d), size-1), 0, -1):
#print("dn: running sum:" + str(running_sum) + " deque: " + str(d))
if callable(rampDown):
yield rampDown(d)
else:
yield running_sum / size
running_sum -= d.popleft()
It seems to be a bit faster than Martijn's version - which is far more elegant, though. 它似乎比Martijn的版本快一点 - 尽管它更加优雅。 Here's the test code:
这是测试代码:
print("")
print("Timeit")
print("-" * 80)
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
# Martijn's version:
def moving_averages_SO(values, size):
for selection in window(values, size):
yield sum(selection) / size
import timeit
problems = [int(i) for i in (10, 100, 1000, 10000, 1e5, 1e6, 1e7)]
for problem_size in problems:
print("{:12s}".format(str(problem_size)), end="")
so = timeit.repeat("list(moving_averages_SO(range("+str(problem_size)+"), 5))", number=1*max(problems)//problem_size,
setup="from __main__ import moving_averages_SO")
print("{:12.3f} ".format(min(so)), end="")
my = timeit.repeat("list(moving_averages(range("+str(problem_size)+"), 5, False, False))", number=1*max(problems)//problem_size,
setup="from __main__ import moving_averages")
print("{:12.3f} ".format(min(my)), end="")
print("")
And the output: 并输出:
Timeit
--------------------------------------------------------------------------------
10 7.242 7.656
100 5.816 5.500
1000 5.787 5.244
10000 5.782 5.180
100000 5.746 5.137
1000000 5.745 5.198
10000000 5.764 5.186
The original question can now be solved with this function call: 现在可以使用此函数调用解决原始问题:
print(list(moving_averages(range(1,11), 5,
rampUp=lambda _: None,
rampDown=False)))
The output: 输出:
[None, None, None, None, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
Use the sum
and map
functions. 使用
sum
和map
函数。
print(sum(map(int, x[num-n:num])))
The map
function in Python 3 is basically a lazy version of this: Python 3中的
map
函数基本上是一个懒惰的版本:
[int(i) for i in x[num-n:num]]
I'm sure you can guess what the sum
function does. 我相信你可以猜出
sum
函数的作用。
An approach that avoids recomputing intermediate sums.. 一种避免重新计算中间总和的方法..
list=range(0,12)
def runs(v):
global runningsum
runningsum+=v
return(runningsum)
runningsum=0
runsumlist=[ runs(v) for v in list ]
result = [ (runsumlist[k] - runsumlist[k-5])/5 for k in range(0,len(list)+1)]
print result 打印结果
[2,3,4,5,6,7,8,9]
make that runs(int(v)) .. then .. repr( runsumlist[k] - runsumlist[k-5])/5 ) if you ant to carry around numbers a strings.. make that(int(v))..然后.. repr(runsumlist [k] - runsumlist [k-5])/ 5)如果你蚂蚁携带数字一个字符串..
Alt without the global: 没有全局的Alt:
list = [float[x] for x in range(0,12)]
nave = 5
movingave = sum(list[:nave]/nave)
for i in range(len(list)-nave):movingave.append(movingave[-1]+(list[i+nave]-list[i])/nave)
print movingave
be sure to do floating math even if you input values are integers 即使您输入的值是整数,也一定要进行浮动数学运算
[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9,0]
There is another solution extending an itertools
recipe pairwise()
. 还有另一种解决方案是
pairwise()
扩展itertools
配方pairwise()
。 You can extend this to nwise()
, which gives you the sliding window (and works if the iterable is a generator): 您可以将其扩展为
nwise()
,它为您提供滑动窗口(如果iterable是生成器,则可以工作):
def nwise(iterable, n):
ts = it.tee(iterable, n)
for c, t in enumerate(ts):
next(it.islice(t, c, c), None)
return zip(*ts)
def moving_averages_nw(iterable, n):
yield from (sum(x)/n for x in nwise(iterable, n))
>>> list(moving_averages_nw(range(1, 11), 5))
[3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
While a relatively high setup cost for short iterable
s this cost reduces in impact the longer the data set. 虽然短
iterable
的设置成本相对较高,但这种成本会降低影响,数据集的时间越长。 This uses sum()
but the code is reasonably elegant: 这使用
sum()
但代码相当优雅:
Timeit MP cfi *****
--------------------------------------------------------------------------------
10 4.658 4.959 7.351
100 5.144 4.070 4.234
1000 5.312 4.020 3.977
10000 5.317 4.031 3.966
100000 5.508 4.115 4.087
1000000 5.526 4.263 4.202
10000000 5.632 4.326 4.242
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.