[英]Iterate over unique dates on a Pandas dataframe
I have a pandas dataframe like this我有一个像这样的 pandas dataframe
id date time dif
01 2020-04-02 09:44:00
02 2020-04-02 09:50:23
03 2020-04-02 09:54:56
04 2020-04-03 10:24:42
05 2020-04-03 10:32:12
06 2020-04-03 11:12:21
...
What I'm tryng to do is calculate time difference, in minutes, between each row and its previous one per day.我要做的是计算每一行与每天前一行之间的时间差(以分钟为单位)。 So the result should be like this
所以结果应该是这样的
id date time dif
01 2020-04-02 09:44:00 6
02 2020-04-02 09:50:23 4
03 2020-04-02 09:54:56
04 2020-04-03 10:24:42 7
05 2020-04-03 10:32:12 40
06 2020-04-03 11:12:21
...
My first thought was to create a list with the unique values of the column date and tried this:我的第一个想法是创建一个包含日期列唯一值的列表并尝试了这个:
import pandas a dp
import numpy as np
...
dates = df.date.unique()
for d in dates:
df['dif'] = round(df['time'].diff(-1).dt.total_seconds().div(60),0) * -1
But I think it isn't so easy...但我认为这并不容易...
Use DataFrameGroupBy.diff
with Series.dt.total_seconds
and Series.round
:将
DataFrameGroupBy.diff
与Series.dt.total_seconds
和Series.round
一起使用:
df['time'] = pd.to_timedelta(df['time'])
df['dif'] = df.groupby('date')['time'].diff(-1).dt.total_seconds().div(60).round().mul(-1)
Or use DataFrameGroupBy.shift
with subtracting:或使用
DataFrameGroupBy.shift
减去:
df['dif'] = (df.groupby('date')['time'].shift(-1)
.sub(df['time'])
.dt.total_seconds()
.div(60)
.round())
print (df)
id date time dif
0 1 2020-04-02 09:44:00 6.0
1 2 2020-04-02 09:50:23 5.0
2 3 2020-04-02 09:54:56 NaN
3 4 2020-04-03 10:24:42 8.0
4 5 2020-04-03 10:32:12 40.0
5 6 2020-04-03 11:12:21 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.