[英]calculate the time difference between two consecutive rows in pandas
I have a pandas dataframe as follows我有一个熊猫数据框如下
Dev_id Time
88345 13:40:31
87556 13:20:33
88955 13:05:00
..... ........
85678 12:15:28
The above dataframe has 83000 rows.上面的数据帧有 83000 行。 I want to take time difference between two consecutive rows and keep it in a separate column.我想取连续两行之间的时间差并将其保存在单独的列中。 The desired result would be想要的结果是
Dev_id Time Time_diff(in min)
88345 13:40:31 20
87556 13:20:33 15
88955 13:05:00 15
I have tried df['Time_diff'] = df['Time'].diff(-1)
but getting error as shown below我试过df['Time_diff'] = df['Time'].diff(-1)
但得到如下所示的错误
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
How to solve this如何解决这个问题
Problem is pandas
need datetime
s or timedelta
s for diff
function, so first converting by to_timedelta
, then get total_seconds
and divide by 60
:问题是pandas
需要datetime
s 或timedelta
s 作为diff
函数,所以首先通过to_timedelta
转换,然后得到total_seconds
并除以60
:
df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.966667
1 87556 13:20:33 15.550000
2 88955 13:05:00 49.533333
3 85678 12:15:28 NaN
If want floor
or round
per minutes:如果想要每分钟floor
或round
:
df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
.diff(-1)
.dt.floor('T')
.dt.total_seconds()
.div(60))
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.0
1 87556 13:20:33 15.0
2 88955 13:05:00 49.0
3 85678 12:15:28 NaN
您应该首先将 df['Time'] 列转换/转换为pd.Timedelta
然后进行减法
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.