How to dynamically calculate mean of Pandas Series?

Question

I have a Series that contains some keys and values, just like:

> first
x    0.167965
y    0.380518
z    0.443677
dtype: float64

And from time to time, I'll have another one, same structure but different numbers, like:

> second
x    0.242322
y    0.991292
z    0.850728
dtype: float64

I want to get their mean. For that, I can create a DataFrame, add them as rows, and grab the mean:

> df = pd.DataFrame()
> df = both.append(first, ignore_index=True)
> df = both.append(second, ignore_index=True)
> df
          x         y         z
0  0.167965  0.380518  0.443677
1  0.242322  0.991292  0.850728
> first_second_mean = both.mean()
> first_second_mean
x    0.205144
y    0.685905
z    0.647203
dtype: float64

And that's cool, it works and all.

Another thing I can do is add them directly, and then divide:

> added = first + second
> added
x    0.410287
y    1.371810
z    1.294405
dtype: float64
> first_second_mean = added / 2
> first_second_mean
x    0.205144
y    0.685905
z    0.647203
dtype: float64

If there's a third one, I can scale it up:

> third
x    0.252872
y    0.791024
z    0.809272
dtype: float64

If I do the DataFrame approach with all three:

> df = pd.DataFrame()
> df = df.append(first, ignore_index=True)
> df = df.append(second, ignore_index=True)
> df = df.append(third, ignore_index=True)
> df
          x         y         z
0  0.167965  0.380518  0.443677
1  0.242322  0.991292  0.850728
2  0.252872  0.791024  0.809272
> df.mean()
x    0.221053
y    0.720945
z    0.701226
dtype: float64

And if I manually add and divide:

> added = first + second + third
> added
x    0.663159
y    2.162834
z    2.103677
dtype: float64
> added / 3
x    0.221053
y    0.720945
z    0.701226
dtype: float64

And that works, but I have to keep track of every single one of the Series, and what I need is a way to do it with only the previous average, something like this:

> df = pd.DataFrame()
> df = df.append(first_second_mean, ignore_index=True)
> df = df.append(third, ignore_index=True)
> df
          x         y         z
0  0.205144  0.685905  0.647203
1  0.252872  0.791024  0.809272
> df.mean()
x    0.229008
y    0.738464
z    0.728237
dtype: float64

And, well, the results don't match up. If I try the manual adding and dividing approach:

> added = first_second_mean + third
> added
x    0.458016
y    1.476929
z    1.456474
dtype: float64
> added / 2
x    0.229008
y    0.738464
z    0.728237
dtype: float64

Correct mean of all three:

x    0.221053
y    0.720945
z    0.701226
dtype: float64

Incorrect mean of all three:

x    0.229008
y    0.738464
z    0.728237
dtype: float64

So obviously, my math is wrong. How can I, using only the previous mean ( first_second_mean ) and the new Series ( third ) calculate the correct mean, as if I had calculated the mean of all the parts ( first , second and third ) directly?

I want to keep only the mean, and update it with any new values as they come, and this might happen a lot of times, not just three, as in this example.

Answer 1

To update an average, you have to keep track of how many you've averaged over so far.

Assuming you've got one series avg , which is the average over N previous items and a new item new , then just do

avg = (N*avg + new)/(N+1)
N += 1

Answer 2

Do small sef_def function

def ave_sum(l):
    prev = l[0]
    for cur in l[1:]:
        prev = (cur + prev)/2
    return prev
ave_sum([f,s,t])
Out[242]: 
x    0.229008
y    0.738464
z    0.728237
dtype: float64

How to dynamically calculate mean of Pandas Series?

Question

2 answers

solution1
1 ACCPTED 2020-06-27 04:51:37

solution2
0 2020-06-26 21:58:27

How to dynamically calculate mean of Pandas Series?

Question

2 answers

solution1 1 ACCPTED 2020-06-27 04:51:37

solution2 0 2020-06-26 21:58:27

solution1
1 ACCPTED 2020-06-27 04:51:37

solution2
0 2020-06-26 21:58:27