简体   繁体   中英

Creating new column in DataFrame with conditional rolling().mean()

I am working on quite a large DataFrame and would like to speed things up. My data is in the following format:

Name Number
Condition 1 1
Condition 1 2
Condition 1 2.2
Condition 1 2
Condition 2 1
Condition 2 1.1
Condition 2 1.2
Condition 2 1.3

Now I want to apply a rolling mean over the numbers, but without mixing different conditions. My expected outcome would be:

Name Number Mean
Condition 1 1 nan
Condition 1 2 1.5
Condition 1 2.2 2.1
Condition 1 2 2.1
Condition 2 1 nan
Condition 2 1.1 1.05
Condition 2 1.2 1.15
Condition 2 1.3 1.25

What I am doing right now is to iterate over all conditions and save the calculated mean in a different Dataframe, like:

mean = {}
for con in data["Name"].drop_duplicates():
   mean[con] = data[data["Name" == con]].rolling(window=2).mean()

But this always takes several minutes, that's why I am looking for an alternative to create the column directly. I tried it with.apply but that doesn't work. At least not with what I tried so far.

Thank you very much!

Group by Name and then transform Number with the rolling mean:

df['Mean'] = df.groupby('Name')['Number'].transform(lambda x: x.rolling(window=2).mean())
print(df)

Output

          Name  Number  Mean
0  Condition 1     1.0   NaN
1  Condition 1     2.0  1.50
2  Condition 1     2.2  2.10
3  Condition 1     2.0  2.10
4  Condition 2     1.0   NaN
5  Condition 2     1.1  1.05
6  Condition 2     1.2  1.15
7  Condition 2     1.3  1.25

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM