简体   繁体   中英

How to apply pandas groupby in Python on multiple columns and aggregate columns in list of tuples?

I have a pandas dataframe, lets say:

data = {"action"  : ["create_ticket", "create_ticket", "create_ticket"],
        "start"   : ["2016-01-02", "2016-01-02", "2016-01-21"],
        "end"     : ["2016-01-04", "2016-01-05", "2016-01-28"],
        "duration": [2, 3, 7]
       }

df = pd.DataFrame (data, columns = ["action", "start", "end", "duration"])

which looks like:

    action          start       end         duration
0   create_ticket   2016-01-02  2016-01-04  2
1   create_ticket   2016-01-02  2016-01-05  3
2   create_ticket   2016-01-21  2016-01-28  7

Now, I want to groupby the first two columns ( action and start ) en aggregate the two columns end and duration into a list of tuples. So my desired output would look like:

    action          start       endpoints
0   create_ticket   2016-01-02  [(2016-01-04, 2), (2016-01-05, 3)]
2   create_ticket   2016-01-21  [(2016-01-28, 7)]

I tried executing:

df = df.groupby(['action', 'start'])['end', 'duration'].apply(list).to_frame()
df.reset_index(inplace=True)

But this gives:

    action          start       0
0   create_ticket   2016-01-02  [end, duration]
1   create_ticket   2016-01-21  [end, duration]

How to solve this?

Use df.apply on df.values :

In [43]: df.groupby(['action', 'start'])[['end', 'duration']].apply(lambda x: tuple(x.values))
Out[43]: 
action         start     
create_ticket  2016-01-02    ([2016-01-04, 2], [2016-01-05, 3])
               2016-01-21                    ([2016-01-28, 7],)
dtype: object

you can try groupby.agg / groupby.apply after zipping the columns you want as tuples:

(df.assign(New=pd.Series(zip(df['end'],df['duration'])))
.groupby(['action','start'],as_index=False)['New'].agg(list))

          action       start                                 New
0  create_ticket  2016-01-02  [(2016-01-04, 2), (2016-01-05, 3)]
1  create_ticket  2016-01-21                   [(2016-01-28, 7)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM