简体   繁体   中英

New pandas version: how to groupby all columns with different aggregation statistics

I have a df that looks like this:

     time    volts1    volts2
0   0.000 -0.299072  0.427551
2   0.001 -0.299377  0.427551
4   0.002 -0.298767  0.427551
6   0.003 -0.298767  0.422974
8   0.004 -0.298767  0.422058
10  0.005 -0.298462  0.422363
12  0.006 -0.298767  0.422668
14  0.007 -0.298462  0.422363
16  0.008 -0.301208  0.420227
18  0.009 -0.303345  0.418091

In actuality, the df has >50 columns, but for simplicity, I'm just showing 3.

I want to groupby this df every n rows, lets say 5. I want to aggregate time with max and the rest of the columns I want to aggregate by mean . Because there are so many columns, I'd love to be able to loop this and not have to do it manually.

I know I can do something like this where I go through and create all new columns manually:

df.groupby(df.index // 5).agg(time=('time', 'max'),
                           volts1=('volts1', 'mean'),
                           volts1=('volts1', 'mean'),
                           ...
                           )

but because there are so many columns, I want to do this in a loop, something like:

df.groupby(df.index // 5).agg(time=('time', 'max'),
                           # df.time is always the first column
                           [i for i in df.columns[1:]]=(i, 'mean'),
                           )

If useful:

print(pd.__version__)
1.0.5

You can use a dictionary:

d = {col: "mean" if not col=='time' else "max" for col in df.columns}
#{'time': 'max', 'volts1': 'mean', 'volts2': 'mean'}
df.groupby(df.index // 5).agg(d)

    time    volts1    volts2
0  0.002 -0.299072  0.427551
1  0.004 -0.298767  0.422516
2  0.007 -0.298564  0.422465
3  0.009 -0.302276  0.419159

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM