![](/img/trans.png)
[英]Pandas associate or filter a date column between a range and groupby another column
[英]Calculate difference between successive date column with groupby on another column in pandas?
我有一個熊貓數據框,
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
['Train','2019-01-06T19:44:09Z'],
['Train','2019-01-02T19:44:09Z'],
['Car','2019-01-08T06:44:09Z'],
['Car','2019-01-06T18:44:09Z'],
['Train','2019-01-04T19:44:09Z'],
['Car','2019-01-05T16:34:09Z'],
['Train','2019-01-08T19:44:09Z'],
['Car','2019-01-07T14:44:09Z'],
['Car','2019-01-06T11:44:09Z'],
['Train','2019-01-10T19:44:09Z'],
],
columns=['Type', 'Date'])
在按日期排序后,需要找出每種類型的連續日期之間的差異
最終數據看起來像
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
['Train','2019-01-06T19:44:09Z',4],
['Train','2019-01-02T19:44:09Z',0],
['Car','2019-01-08T06:44:09Z',3],
['Car','2019-01-06T18:44:09Z',1],
['Train','2019-01-04T19:44:09Z',2],
['Car','2019-01-05T16:34:09Z',0],
['Train','2019-01-08T19:44:09Z',6],
['Car','2019-01-07T14:44:09Z',2],
['Car','2019-01-06T11:44:09Z',1],
['Train','2019-01-10T19:44:09Z',8],
],
columns=['Type', 'Date','diff'])
在這里,Type Car min(Date) 是 2019-01-05T16:34:09Z,所以 diff 從 0 開始,然后第二個日期是 2019-01-06T18:44:09Z 和 2019-01-06T11:44:09Z,所以 diff 是 1 天(這里不確定是否可以包括時間)等等。對於 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 那么 2019-01-04T19:44 :09Z 所以 2 天的差異
我試過 groupby,但不確定如何包括日期排序
data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')
將pandas.DataFrame.groupby
與dt.date
一起使用:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())
輸出:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1 days
1 Train 2019-01-06 19:44:09+00:00 4 days
2 Train 2019-01-02 19:44:09+00:00 0 days
3 Car 2019-01-08 06:44:09+00:00 3 days
4 Car 2019-01-06 18:44:09+00:00 1 days
5 Train 2019-01-04 19:44:09+00:00 2 days
6 Car 2019-01-05 16:34:09+00:00 0 days
7 Train 2019-01-08 19:44:09+00:00 6 days
8 Car 2019-01-07 14:44:09+00:00 2 days
9 Car 2019-01-06 11:44:09+00:00 1 days
10 Train 2019-01-10 19:44:09+00:00 8 days
如果您希望它們是int
,請添加dt.days
:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days
輸出:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1
1 Train 2019-01-06 19:44:09+00:00 4
2 Train 2019-01-02 19:44:09+00:00 0
3 Car 2019-01-08 06:44:09+00:00 3
4 Car 2019-01-06 18:44:09+00:00 1
5 Train 2019-01-04 19:44:09+00:00 2
6 Car 2019-01-05 16:34:09+00:00 0
7 Train 2019-01-08 19:44:09+00:00 6
8 Car 2019-01-07 14:44:09+00:00 2
9 Car 2019-01-06 11:44:09+00:00 1
10 Train 2019-01-10 19:44:09+00:00 8
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
直接從transform
中減去
s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days
Out[36]:
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.