简体   繁体   中英

How to dynamically calculate mean of Pandas Series?

I have a Series that contains some keys and values, just like:

> first
x    0.167965
y    0.380518
z    0.443677
dtype: float64

And from time to time, I'll have another one, same structure but different numbers, like:

> second
x    0.242322
y    0.991292
z    0.850728
dtype: float64

I want to get their mean. For that, I can create a DataFrame, add them as rows, and grab the mean:

> df = pd.DataFrame()
> df = both.append(first, ignore_index=True)
> df = both.append(second, ignore_index=True)
> df
          x         y         z
0  0.167965  0.380518  0.443677
1  0.242322  0.991292  0.850728
> first_second_mean = both.mean()
> first_second_mean
x    0.205144
y    0.685905
z    0.647203
dtype: float64

And that's cool, it works and all.

Another thing I can do is add them directly, and then divide:

> added = first + second
> added
x    0.410287
y    1.371810
z    1.294405
dtype: float64
> first_second_mean = added / 2
> first_second_mean
x    0.205144
y    0.685905
z    0.647203
dtype: float64

If there's a third one, I can scale it up:

> third
x    0.252872
y    0.791024
z    0.809272
dtype: float64

If I do the DataFrame approach with all three:

> df = pd.DataFrame()
> df = df.append(first, ignore_index=True)
> df = df.append(second, ignore_index=True)
> df = df.append(third, ignore_index=True)
> df
          x         y         z
0  0.167965  0.380518  0.443677
1  0.242322  0.991292  0.850728
2  0.252872  0.791024  0.809272
> df.mean()
x    0.221053
y    0.720945
z    0.701226
dtype: float64

And if I manually add and divide:

> added = first + second + third
> added
x    0.663159
y    2.162834
z    2.103677
dtype: float64
> added / 3
x    0.221053
y    0.720945
z    0.701226
dtype: float64

And that works, but I have to keep track of every single one of the Series, and what I need is a way to do it with only the previous average, something like this:

> df = pd.DataFrame()
> df = df.append(first_second_mean, ignore_index=True)
> df = df.append(third, ignore_index=True)
> df
          x         y         z
0  0.205144  0.685905  0.647203
1  0.252872  0.791024  0.809272
> df.mean()
x    0.229008
y    0.738464
z    0.728237
dtype: float64

And, well, the results don't match up. If I try the manual adding and dividing approach:

> added = first_second_mean + third
> added
x    0.458016
y    1.476929
z    1.456474
dtype: float64
> added / 2
x    0.229008
y    0.738464
z    0.728237
dtype: float64

Correct mean of all three:

x    0.221053
y    0.720945
z    0.701226
dtype: float64

Incorrect mean of all three:

x    0.229008
y    0.738464
z    0.728237
dtype: float64

So obviously, my math is wrong. How can I, using only the previous mean ( first_second_mean ) and the new Series ( third ) calculate the correct mean, as if I had calculated the mean of all the parts ( first , second and third ) directly?

I want to keep only the mean, and update it with any new values as they come, and this might happen a lot of times, not just three, as in this example.

To update an average, you have to keep track of how many you've averaged over so far.

Assuming you've got one series avg , which is the average over N previous items and a new item new , then just do

avg = (N*avg + new)/(N+1)
N += 1

Do small sef_def function

def ave_sum(l):
    prev = l[0]
    for cur in l[1:]:
        prev = (cur + prev)/2
    return prev
ave_sum([f,s,t])
Out[242]: 
x    0.229008
y    0.738464
z    0.728237
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM