简体   繁体   English

Pandas DataFrame用两列分组,并添加列作为移动平均值

[英]Pandas DataFrame Groupby two columns and add column for moving average

I have a dataframe that I want to group using multiple columns and then add a calculated column (mean) based on the grouping. 我有一个数据框,我想使用多个列进行分组,然后根据分组添加计算列(平均值)。 Can someone give me a hand? 有人可以帮我吗?

I have tried the grouping and it works fine, but adding the calculated (rolling mean) column is proving to be a hustle 我已经尝试了分组,但是效果很好,但是添加计算(滚动平均值)列被证明是一种麻烦

import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], list('AAAAAAAABBBBBBBB'), ['RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW','RED','BLUE','GREEN','YELLOW'], ['1','1','1','1','2','2','2','2','1','1','1','1','2','2','2','2'],[100,112,99,120,105,114,100,150,200,134,167,150,134,189,172,179]]).T
df.columns = ['id','Station','Train','month_code','total']
df2 = df.groupby(['Station','Train','month_code','total']).size().reset_index().groupby(['Station','Train','month_code'])['total'].max()

Looking at getting an outcome similar to this below 看下面的结果类似

Station  Train   month_code total   average
A   BLUE        1       112 
                2       114       113
    GREEN       1       99        106.5
                2       100       99.5
    RED         1       100       100
                2       105       102.5
    YELLOW      1       120       112.5
                2       150       135
B   BLUE        1       134       142
                2       189       161.5
    GREEN       1       167       178
                2       172       169.5
    RED         1       200       186
                2       134       167
    YELLOW      1       150       142
                2       179       164.5

How about you change your initial groupby to keep the column name 'total' . 您如何更改初始groupby依据以保持列名'total'

df3 = df.groupby(['Station','Train','month_code']).sum()

>>> df3.head()
                          id  total
Station Train month_code           
A       BLUE  1            2    112
              2            6    114
        GREEN 1            3     99
              2            7    100
        RED   1            1    100

Then do a rolling mean on the total column. 然后对total列进行滚动平均。

df3['average'] = df3['total'].rolling(2).mean()

>>> df3.head()
                          id  total  average
Station Train month_code                    
A       BLUE  1            2    112      NaN
              2            6    114    113.0
        GREEN 1            3     99    106.5
              2            7    100     99.5
        RED   1            1    100    100.0

You can then still remove the id column if you don't want it. 然后,您仍然可以删除ID列(如果不需要)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM