I have a pandas dataframe, lets say:
data = {"action" : ["create_ticket", "create_ticket", "create_ticket"],
"start" : ["2016-01-02", "2016-01-02", "2016-01-21"],
"end" : ["2016-01-04", "2016-01-05", "2016-01-28"],
"duration": [2, 3, 7]
}
df = pd.DataFrame (data, columns = ["action", "start", "end", "duration"])
which looks like:
action start end duration
0 create_ticket 2016-01-02 2016-01-04 2
1 create_ticket 2016-01-02 2016-01-05 3
2 create_ticket 2016-01-21 2016-01-28 7
Now, I want to groupby the first two columns ( action
and start
) en aggregate the two columns end
and duration
into a list of tuples. So my desired output would look like:
action start endpoints
0 create_ticket 2016-01-02 [(2016-01-04, 2), (2016-01-05, 3)]
2 create_ticket 2016-01-21 [(2016-01-28, 7)]
I tried executing:
df = df.groupby(['action', 'start'])['end', 'duration'].apply(list).to_frame()
df.reset_index(inplace=True)
But this gives:
action start 0
0 create_ticket 2016-01-02 [end, duration]
1 create_ticket 2016-01-21 [end, duration]
How to solve this?
Use df.apply
on df.values
:
In [43]: df.groupby(['action', 'start'])[['end', 'duration']].apply(lambda x: tuple(x.values))
Out[43]:
action start
create_ticket 2016-01-02 ([2016-01-04, 2], [2016-01-05, 3])
2016-01-21 ([2016-01-28, 7],)
dtype: object
you can try groupby.agg
/ groupby.apply
after zipping the columns you want as tuples:
(df.assign(New=pd.Series(zip(df['end'],df['duration'])))
.groupby(['action','start'],as_index=False)['New'].agg(list))
action start New
0 create_ticket 2016-01-02 [(2016-01-04, 2), (2016-01-05, 3)]
1 create_ticket 2016-01-21 [(2016-01-28, 7)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.