简体   繁体   English

迭代 Pandas dataframe 上的唯一日期

[英]Iterate over unique dates on a Pandas dataframe

I have a pandas dataframe like this我有一个像这样的 pandas dataframe

id        date      time    dif
01  2020-04-02  09:44:00
02  2020-04-02  09:50:23
03  2020-04-02  09:54:56
04  2020-04-03  10:24:42
05  2020-04-03  10:32:12
06  2020-04-03  11:12:21
...

What I'm tryng to do is calculate time difference, in minutes, between each row and its previous one per day.我要做的是计算每一行与每天前一行之间的时间差(以分钟为单位)。 So the result should be like this所以结果应该是这样的

id        date      time    dif
01  2020-04-02  09:44:00      6
02  2020-04-02  09:50:23      4
03  2020-04-02  09:54:56
04  2020-04-03  10:24:42      7
05  2020-04-03  10:32:12     40
06  2020-04-03  11:12:21
...

My first thought was to create a list with the unique values of the column date and tried this:我的第一个想法是创建一个包含日期列唯一值的列表并尝试了这个:

import pandas a dp
import numpy as np

...

dates = df.date.unique()

for d in dates:
  df['dif'] = round(df['time'].diff(-1).dt.total_seconds().div(60),0) * -1

But I think it isn't so easy...但我认为这并不容易...

Use DataFrameGroupBy.diff with Series.dt.total_seconds and Series.round :DataFrameGroupBy.diffSeries.dt.total_secondsSeries.round一起使用:

df['time'] = pd.to_timedelta(df['time'])

df['dif'] = df.groupby('date')['time'].diff(-1).dt.total_seconds().div(60).round().mul(-1)

Or use DataFrameGroupBy.shift with subtracting:或使用DataFrameGroupBy.shift减去:

df['dif'] = (df.groupby('date')['time'].shift(-1)
               .sub(df['time'])
               .dt.total_seconds()
               .div(60)
               .round())
print (df)
   id        date     time   dif
0   1  2020-04-02 09:44:00   6.0
1   2  2020-04-02 09:50:23   5.0
2   3  2020-04-02 09:54:56   NaN
3   4  2020-04-03 10:24:42   8.0
4   5  2020-04-03 10:32:12  40.0
5   6  2020-04-03 11:12:21   NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM