[英]Calculate difference between successive date column with groupby on another column in pandas?
I have a pandas dataframe,我有一个熊猫数据框,
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
['Train','2019-01-06T19:44:09Z'],
['Train','2019-01-02T19:44:09Z'],
['Car','2019-01-08T06:44:09Z'],
['Car','2019-01-06T18:44:09Z'],
['Train','2019-01-04T19:44:09Z'],
['Car','2019-01-05T16:34:09Z'],
['Train','2019-01-08T19:44:09Z'],
['Car','2019-01-07T14:44:09Z'],
['Car','2019-01-06T11:44:09Z'],
['Train','2019-01-10T19:44:09Z'],
],
columns=['Type', 'Date'])
Need to find the difference between successive dates for each type, after sorting them by dates在按日期排序后,需要找出每种类型的连续日期之间的差异
Final data looks like最终数据看起来像
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
['Train','2019-01-06T19:44:09Z',4],
['Train','2019-01-02T19:44:09Z',0],
['Car','2019-01-08T06:44:09Z',3],
['Car','2019-01-06T18:44:09Z',1],
['Train','2019-01-04T19:44:09Z',2],
['Car','2019-01-05T16:34:09Z',0],
['Train','2019-01-08T19:44:09Z',6],
['Car','2019-01-07T14:44:09Z',2],
['Car','2019-01-06T11:44:09Z',1],
['Train','2019-01-10T19:44:09Z',8],
],
columns=['Type', 'Date','diff'])
Here, Type Car min(Date) is 2019-01-05T16:34:09Z, so the diff starts as 0, then second date is 2019-01-06T18:44:09Z and 2019-01-06T11:44:09Z, so diff is 1 day (here not sure whether time can be included) and so on.. For Type Train min(Date) is 2019-01-02T19:44:09Z, so diff is 0 then 2019-01-04T19:44:09Z so 2 days diff在这里,Type Car min(Date) 是 2019-01-05T16:34:09Z,所以 diff 从 0 开始,然后第二个日期是 2019-01-06T18:44:09Z 和 2019-01-06T11:44:09Z,所以 diff 是 1 天(这里不确定是否可以包括时间)等等。对于 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 那么 2019-01-04T19:44 :09Z 所以 2 天的差异
I tried groupby, but not sure how to include sort on date我试过 groupby,但不确定如何包括日期排序
data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')
Use pandas.DataFrame.groupby
with dt.date
:将pandas.DataFrame.groupby
与dt.date
一起使用:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())
Output:输出:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1 days
1 Train 2019-01-06 19:44:09+00:00 4 days
2 Train 2019-01-02 19:44:09+00:00 0 days
3 Car 2019-01-08 06:44:09+00:00 3 days
4 Car 2019-01-06 18:44:09+00:00 1 days
5 Train 2019-01-04 19:44:09+00:00 2 days
6 Car 2019-01-05 16:34:09+00:00 0 days
7 Train 2019-01-08 19:44:09+00:00 6 days
8 Car 2019-01-07 14:44:09+00:00 2 days
9 Car 2019-01-06 11:44:09+00:00 1 days
10 Train 2019-01-10 19:44:09+00:00 8 days
If you want them to be int
, add dt.days
:如果您希望它们是int
,请添加dt.days
:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days
Output:输出:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1
1 Train 2019-01-06 19:44:09+00:00 4
2 Train 2019-01-02 19:44:09+00:00 0
3 Car 2019-01-08 06:44:09+00:00 3
4 Car 2019-01-06 18:44:09+00:00 1
5 Train 2019-01-04 19:44:09+00:00 2
6 Car 2019-01-05 16:34:09+00:00 0
7 Train 2019-01-08 19:44:09+00:00 6
8 Car 2019-01-07 14:44:09+00:00 2
9 Car 2019-01-06 11:44:09+00:00 1
10 Train 2019-01-10 19:44:09+00:00 8
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
Direct subtraction from transform
直接从transform
中减去
s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days
Out[36]:
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.