简体   繁体   English

计算pandas中连续两行之间的时间差

[英]calculate the time difference between two consecutive rows in pandas

I have a pandas dataframe as follows我有一个熊猫数据框如下

Dev_id     Time
88345      13:40:31
87556      13:20:33
88955      13:05:00
.....      ........
85678      12:15:28

The above dataframe has 83000 rows.上面的数据帧有 83000 行。 I want to take time difference between two consecutive rows and keep it in a separate column.我想取连续两行之间的时间差并将其保存在单独的列中。 The desired result would be想要的结果是

Dev_id    Time          Time_diff(in min)
88345      13:40:31      20
87556      13:20:33      15
88955      13:05:00      15

I have tried df['Time_diff'] = df['Time'].diff(-1) but getting error as shown below我试过df['Time_diff'] = df['Time'].diff(-1)但得到如下所示的错误

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

How to solve this如何解决这个问题

Problem is pandas need datetime s or timedelta s for diff function, so first converting by to_timedelta , then get total_seconds and divide by 60 :问题是pandas需要datetime s 或timedelta s 作为diff函数,所以首先通过to_timedelta转换,然后得到total_seconds并除以60

df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31  19.966667
1   87556  13:20:33  15.550000
2   88955  13:05:00  49.533333
3   85678  12:15:28        NaN

If want floor or round per minutes:如果想要每分钟floorround

df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
                     .diff(-1)
                     .dt.floor('T')
                     .dt.total_seconds()
                     .div(60))
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31       19.0
1   87556  13:20:33       15.0
2   88955  13:05:00       49.0
3   85678  12:15:28        NaN

您应该首先将 df['Time'] 列转换/转换为pd.Timedelta然后进行减法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM