简体   繁体   中英

Pandas: how to write a groupby plus an aggregation that can group by one or many columns?

How can I use this groupby plus aggregation operation in such a way that it can flexibly handle one or more groupby columns?

# some data
df = pd.DataFrame({'col1': [1, 5, 1, 2, 2, 2], 'col2': [2, 2, 2, 3, 3, 3], 'col3': [999, 999, 999, 999, 999, 999],
                  'time': ['2020-01-25 12:24:33', '2020-01-25 14:24:33', '2020-01-25 18:24:33',
                           '2020-01-25 09:24:33', '2020-01-25 10:24:33', '2020-01-25 11:24:33']})

# convert time
df['time'] = pd.to_datetime(df['time'])

# groupby with one col, works
df.groupby(['col1', df['time'].dt.floor('d')]).tail(1)

# how to use this structure while being flexibly able to group by one or more cols?
two_cols = ['col1', 'col2']
df.groupby([two_cols, df['time'].dt.floor('d')]).tail(1)

The expected output is the same for both operations:

    col1    col2    col3    time
    5   2   999 2020-01-25 14:24:33
    1   2   999 2020-01-25 18:24:33
    2   3   999 2020-01-25 11:24:33

Pandas is looking for a list of labels for the groupby() function, and so we need to make sure that we give them a list. I believe this works.

df.groupby(two_cols + [df['time'].dt.floor('d')]).tail(1)

You can see that our parameter in groupby() is our list two_cols + another list (in the [] ) that contains just the df['time']... series. Thus, we are combining two lists into a new listobject, and that is what groupby() will run on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM