简体   繁体   English

找到 dataframe 和 python 组之间的时间差

[英]find time difference between groups in dataframe with python

I'm using python's pandas.我正在使用 python 的 pandas。

I'm having the following orders dataframe. when each order have its order id, order time and different items id in the order.我有以下订单 dataframe。当每个订单都有其订单 ID、订单时间和订单中的不同项目 ID 时。 in this example I have three different groups - A,B,C:在此示例中,我有三个不同的组 - A、B、C:

  order_id                 time  item_id
0        A  2022-11-10 08:43:07        1
1        A  2022-11-10 08:43:07        2
2        A  2022-11-10 08:43:07        3
3        B  2022-11-10 08:46:27        1
4        B  2022-11-10 08:46:27        2
5        C  2022-11-10 08:58:45        3

I want to calculate the time difference between group A and B and then between group B and C, by the time order and save the result into another column我想按时间顺序计算A组和B组之间的时间差,然后再计算B组和C之间的时间差,并将结果保存到另一列

wanted result:想要的结果:

 order_id                 time  item_id        time_diff
0        A  2022-11-10 08:43:07        1                 
1        A  2022-11-10 08:43:07        2                 
2        A  2022-11-10 08:43:07        3                 
3        B  2022-11-10 08:46:27        1  0 days 00:03:20
4        B  2022-11-10 08:46:27        2  0 days 00:03:20
5        C  2022-11-10 08:58:45        3  0 days 00:12:18

how can I calculate the time difference between the groups when the time is similar for the entire group?当整个组的时间相似时,如何计算组之间的时间差?

try using.diff() but I got only the difference inside the group:尝试 using.diff() 但我只得到组内的差异:

df['time_diff'] = df.groupby('order_id')['time'].diff()

df
Out[141]: 
  order_id                time  item_id time_diff
0        A 2022-11-10 08:43:07        1       NaT
1        A 2022-11-10 08:43:07        2    0 days
2        A 2022-11-10 08:43:07        3    0 days
3        B 2022-11-10 08:46:27        1       NaT
4        B 2022-11-10 08:46:27        2    0 days
5        C 2022-11-10 08:58:45        3       NaT

I want the difference between the groups and not inside.我想要组之间的区别,而不是内部的区别。 I can calculate the difference with.last().diff() but I don't know how to save it as a column back to the dataframe:我可以用 .last().diff() 计算差异,但我不知道如何将它作为列保存回 dataframe:

df.groupby('order_id')['time'].last().diff().to_frame('time_diff')
Out[]: 
               time_diff
order_id                
A                    NaT
B        0 days 00:03:20
C        0 days 00:12:18

thanks谢谢

You were on the right track.你走在正确的轨道上。 This will work for you:这对你有用:

diff = df.groupby('order_id')['time'].last().diff().to_frame('time_diff').reset_index()
df = df.merge(diff, on='order_id', how='left')
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM