简体   繁体   中英

How to calculate moving average in Python 3?

Let's say I have a list:

y = ['1', '2', '3', '4','5','6','7','8','9','10']

I want to create a function that calculates the moving n-day average. So if n was 5, I would want my code to calculate the first 1-5, add it and find the average, which would be 3.0, then go on to 2-6, calculate the average, which would be 4.0, then 3-7, 4-8, 5-9, 6-10.

I don't want to calculate the first n-1 days, so starting from the nth day, it'll count the previous days.

def moving_average(x:'list of prices', n):
    for num in range(len(x)+1):
        print(x[num-n:num])

This seems to print out what I want:

[]
[]
[]
[]
[]

['1', '2', '3', '4', '5']

['2', '3', '4', '5', '6']

['3', '4', '5', '6', '7']

['4', '5', '6', '7', '8']

['5', '6', '7', '8', '9']

['6', '7', '8', '9', '10']

However, I don't know how to calculate the numbers inside those lists. Any ideas?

There is a great sliding window generator in an old version of the Python docs with itertools examples :

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

Using that your moving averages is trivial:

from __future__ import division  # For Python 2

def moving_averages(values, size):
    for selection in window(values, size):
        yield sum(selection) / size

Running this against your input (mapping the strings to integers) gives:

>>> y= ['1', '2', '3', '4','5','6','7','8','9','10']
>>> for avg in moving_averages(map(int, y), 5):
...     print(avg)
... 
3.0
4.0
5.0
6.0
7.0
8.0

To return None the first n - 1 iterations for 'incomplete' sets, just expand the moving_averages function a little:

def moving_averages(values, size):
    for _ in range(size - 1):
        yield None
    for selection in window(values, size):
        yield sum(selection) / size

While I like Martijn's answer on this, like george, I was wondering if this wouldn't be faster by using a running summation instead of applying the sum() over and over again on mostly the same numbers.

Also the idea of having None values as default during the ramp up phase is interesting. In fact there may be plenty of different scenarios one could conceive for moving averages. Let's split the calculation of averages into three phases:

  1. Ramp Up: Starting iterations where the current iteration count < window size
  2. Steady Progress: We have exactly window size number of elements available to calculate a normal average := sum(x[iteration_counter-window_size:iteration_counter])/window_size
  3. Ramp Down: At the end of the input data, we could return another window_size - 1 "average" numbers.

Here's a function that accepts

  • Arbitrary iterables (generators are fine) as input for data
  • Arbitrary window sizes >= 1
  • Parameters to switch on/off production of values during the phases for Ramp Up/Down
  • Callback functions for those phases to control how values are produced. This can be used to constantly provide a default (eg None ) or to provide partial averages

Here's the code:

from collections import deque 

def moving_averages(data, size, rampUp=True, rampDown=True):
    """Slide a window of <size> elements over <data> to calc an average

    First and last <size-1> iterations when window is not yet completely
    filled with data, or the window empties due to exhausted <data>, the
    average is computed with just the available data (but still divided
    by <size>).
    Set rampUp/rampDown to False in order to not provide any values during
    those start and end <size-1> iterations.
    Set rampUp/rampDown to functions to provide arbitrary partial average
    numbers during those phases. The callback will get the currently
    available input data in a deque. Do not modify that data.
    """
    d = deque()
    running_sum = 0.0

    data = iter(data)
    # rampUp
    for count in range(1, size):
        try:
            val = next(data)
        except StopIteration:
            break
        running_sum += val
        d.append(val)
        #print("up: running sum:" + str(running_sum) + "  count: " + str(count) + "  deque: " + str(d))
        if rampUp:
            if callable(rampUp):
                yield rampUp(d)
            else:
                yield running_sum / size

    # steady
    exhausted_early = True
    for val in data:
        exhausted_early = False
        running_sum += val
        #print("st: running sum:" + str(running_sum) + "  deque: " + str(d))
        yield running_sum / size
        d.append(val)
        running_sum -= d.popleft()

    # rampDown
    if rampDown:
        if exhausted_early:
            running_sum -= d.popleft()
        for (count) in range(min(len(d), size-1), 0, -1):
            #print("dn: running sum:" + str(running_sum) + "  deque: " + str(d))
            if callable(rampDown):
                yield rampDown(d)
            else:
                yield running_sum / size
            running_sum -= d.popleft()

It seems to be a bit faster than Martijn's version - which is far more elegant, though. Here's the test code:

print("")
print("Timeit")
print("-" * 80)

from itertools import islice
def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

# Martijn's version:
def moving_averages_SO(values, size):
    for selection in window(values, size):
        yield sum(selection) / size


import timeit
problems = [int(i) for i in (10, 100, 1000, 10000, 1e5, 1e6, 1e7)]
for problem_size in problems:
    print("{:12s}".format(str(problem_size)), end="")

    so = timeit.repeat("list(moving_averages_SO(range("+str(problem_size)+"), 5))", number=1*max(problems)//problem_size,
                       setup="from __main__ import moving_averages_SO")
    print("{:12.3f} ".format(min(so)), end="")

    my = timeit.repeat("list(moving_averages(range("+str(problem_size)+"), 5, False, False))", number=1*max(problems)//problem_size,
                       setup="from __main__ import moving_averages")
    print("{:12.3f} ".format(min(my)), end="")

    print("")

And the output:

Timeit
--------------------------------------------------------------------------------
10                 7.242        7.656 
100                5.816        5.500 
1000               5.787        5.244 
10000              5.782        5.180 
100000             5.746        5.137 
1000000            5.745        5.198 
10000000           5.764        5.186 

The original question can now be solved with this function call:

print(list(moving_averages(range(1,11), 5,
                           rampUp=lambda _: None,
                           rampDown=False)))

The output:

[None, None, None, None, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

Use the sum and map functions.

print(sum(map(int, x[num-n:num])))

The map function in Python 3 is basically a lazy version of this:

[int(i) for i in x[num-n:num]]

I'm sure you can guess what the sum function does.

An approach that avoids recomputing intermediate sums..

list=range(0,12)
def runs(v):
 global runningsum
 runningsum+=v
 return(runningsum)
runningsum=0
runsumlist=[ runs(v) for v in list ]
result = [ (runsumlist[k] - runsumlist[k-5])/5 for k in range(0,len(list)+1)]

print result

[2,3,4,5,6,7,8,9]

make that runs(int(v)) .. then .. repr( runsumlist[k] - runsumlist[k-5])/5 ) if you ant to carry around numbers a strings..


Alt without the global:

list = [float[x] for x in range(0,12)]
nave = 5
movingave = sum(list[:nave]/nave)
for i in range(len(list)-nave):movingave.append(movingave[-1]+(list[i+nave]-list[i])/nave)
print movingave 

be sure to do floating math even if you input values are integers

[2.0,3.0,4.0,5.0,6.0,7.0,8.0,9,0]

There is another solution extending an itertools recipe pairwise() . You can extend this to nwise() , which gives you the sliding window (and works if the iterable is a generator):

def nwise(iterable, n):
    ts = it.tee(iterable, n)
    for c, t in enumerate(ts):
        next(it.islice(t, c, c), None)
    return zip(*ts)

def moving_averages_nw(iterable, n):
    yield from (sum(x)/n for x in nwise(iterable, n))

>>> list(moving_averages_nw(range(1, 11), 5))
[3.0, 4.0, 5.0, 6.0, 7.0, 8.0]

While a relatively high setup cost for short iterable s this cost reduces in impact the longer the data set. This uses sum() but the code is reasonably elegant:

Timeit              MP           cfi         *****
--------------------------------------------------------------------------------
10                 4.658        4.959        7.351 
100                5.144        4.070        4.234 
1000               5.312        4.020        3.977 
10000              5.317        4.031        3.966 
100000             5.508        4.115        4.087 
1000000            5.526        4.263        4.202 
10000000           5.632        4.326        4.242 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM