I have a dataframe like this,
ID time text
1 8:43:43 PM one day
1 8:43:51 PM this code
1 8:44:07 PM will help
1 8:44:17 PM someone.
2 8:45:56 AM yes
2 8:46:09 AM I'm feeling
2 8:46:25 AM good.
I want to group the time column by ID and calculate the time duration. I know we can use join
to concat text and group by each ID.
The final output will be,
ID time-duration text
1 34 one day this code will help someone.
2 29 yes I'm feeling good.
Use GroupBy.agg
with named aggregations
(best practice from pandas >= 0.25.0
)
The advantage of named aggregations is that we aggregate and at the same time rename our column, see time_duration
in output.
df['time'] = pd.to_datetime(df['time'])
dfg = df.groupby('ID').agg(
time_duration=('time', lambda x: x.max()-x.min()),
text=('text', ' '.join)
).reset_index()
ID time_duration text
0 1 00:00:34 one day this code will help someone.
1 2 00:00:29 yes I'm feeling good.
We can do
df.groupby('ID').agg({'time':np.ptp,'text':' '.join})
Out[49]:
time text
ID
1 00:00:34 one day this code will help someone.
2 00:00:29 yes I'm feeling good.
Groupby and aggregation:
(df.groupby('ID', as_index=False)
.agg({'time': lambda x: (x.max() - x.min()).total_seconds(),
'text': ' '.join})
)
Output:
ID time text
0 1 34.0 one day this code will help someone.
1 2 29.0 yes I'm feeling good.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.