[英]How to group text in one row and calculate the time duration in python pandas?
I have a dataframe like this,我有一个像这样的 dataframe,
ID time text
1 8:43:43 PM one day
1 8:43:51 PM this code
1 8:44:07 PM will help
1 8:44:17 PM someone.
2 8:45:56 AM yes
2 8:46:09 AM I'm feeling
2 8:46:25 AM good.
I want to group the time column by ID and calculate the time duration.我想按 ID 对时间列进行分组并计算持续时间。 I know we can use
join
to concat text and group by each ID.我知道我们可以使用
join
来连接文本并按每个 ID 分组。
The final output will be,最终的 output 将是,
ID time-duration text
1 34 one day this code will help someone.
2 29 yes I'm feeling good.
Use GroupBy.agg
with named aggregations
(best practice from pandas >= 0.25.0
)将
GroupBy.agg
与named aggregations
一起使用(来自pandas >= 0.25.0
的最佳实践)
The advantage of named aggregations is that we aggregate and at the same time rename our column, see time_duration
in output.命名聚合的优点是我们聚合并同时重命名我们的列,请参阅
time_duration
中的 time_duration。
df['time'] = pd.to_datetime(df['time'])
dfg = df.groupby('ID').agg(
time_duration=('time', lambda x: x.max()-x.min()),
text=('text', ' '.join)
).reset_index()
ID time_duration text
0 1 00:00:34 one day this code will help someone.
1 2 00:00:29 yes I'm feeling good.
We can do我们可以做的
df.groupby('ID').agg({'time':np.ptp,'text':' '.join})
Out[49]:
time text
ID
1 00:00:34 one day this code will help someone.
2 00:00:29 yes I'm feeling good.
Groupby and aggregation: Groupby 和聚合:
(df.groupby('ID', as_index=False)
.agg({'time': lambda x: (x.max() - x.min()).total_seconds(),
'text': ' '.join})
)
Output: Output:
ID time text
0 1 34.0 one day this code will help someone.
1 2 29.0 yes I'm feeling good.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.