簡體   English   中英

計算連續日期列與熊貓另一列上的groupby之間的差異?

[英]Calculate difference between successive date column with groupby on another column in pandas?

我有一個熊貓數據框,

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
                     ['Train','2019-01-06T19:44:09Z'],
                     ['Train','2019-01-02T19:44:09Z'],
                     ['Car','2019-01-08T06:44:09Z'],
                     ['Car','2019-01-06T18:44:09Z'],
                     ['Train','2019-01-04T19:44:09Z'],
                     ['Car','2019-01-05T16:34:09Z'],
                     ['Train','2019-01-08T19:44:09Z'],
                     ['Car','2019-01-07T14:44:09Z'],
                     ['Car','2019-01-06T11:44:09Z'],
                     ['Train','2019-01-10T19:44:09Z'],
                     ], 
                    columns=['Type', 'Date'])

在按日期排序后,需要找出每種類型的連續日期之間的差異

最終數據看起來像

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
                     ['Train','2019-01-06T19:44:09Z',4],
                     ['Train','2019-01-02T19:44:09Z',0],
                     ['Car','2019-01-08T06:44:09Z',3],
                     ['Car','2019-01-06T18:44:09Z',1],
                     ['Train','2019-01-04T19:44:09Z',2],
                     ['Car','2019-01-05T16:34:09Z',0],
                     ['Train','2019-01-08T19:44:09Z',6],
                     ['Car','2019-01-07T14:44:09Z',2],
                     ['Car','2019-01-06T11:44:09Z',1],
                     ['Train','2019-01-10T19:44:09Z',8],
                     ], 
                    columns=['Type', 'Date','diff'])

在這里,Type Car min(Date) 是 2019-01-05T16:34:09Z,所以 diff 從 0 開始,然后第二個日期是 2019-01-06T18:44:09Z 和 2019-01-06T11:44:09Z,所以 diff 是 1 天(這里不確定是否可以包括時間)等等。對於 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 那么 2019-01-04T19:44 :09Z 所以 2 天的差異

我試過 groupby,但不確定如何包括日期排序

data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')

pandas.DataFrame.groupbydt.date一起使用:

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())

輸出:

     Type                      Date   diff
0     Car 2019-01-06 21:44:09+00:00 1 days
1   Train 2019-01-06 19:44:09+00:00 4 days
2   Train 2019-01-02 19:44:09+00:00 0 days
3     Car 2019-01-08 06:44:09+00:00 3 days
4     Car 2019-01-06 18:44:09+00:00 1 days
5   Train 2019-01-04 19:44:09+00:00 2 days
6     Car 2019-01-05 16:34:09+00:00 0 days
7   Train 2019-01-08 19:44:09+00:00 6 days
8     Car 2019-01-07 14:44:09+00:00 2 days
9     Car 2019-01-06 11:44:09+00:00 1 days
10  Train 2019-01-10 19:44:09+00:00 8 days

如果您希望它們是int ,請添加dt.days

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days

輸出:

     Type                      Date  diff
0     Car 2019-01-06 21:44:09+00:00     1
1   Train 2019-01-06 19:44:09+00:00     4
2   Train 2019-01-02 19:44:09+00:00     0
3     Car 2019-01-08 06:44:09+00:00     3
4     Car 2019-01-06 18:44:09+00:00     1
5   Train 2019-01-04 19:44:09+00:00     2
6     Car 2019-01-05 16:34:09+00:00     0
7   Train 2019-01-08 19:44:09+00:00     6
8     Car 2019-01-07 14:44:09+00:00     2
9     Car 2019-01-06 11:44:09+00:00     1
10  Train 2019-01-10 19:44:09+00:00     8
  • 首先將日期轉換為日期到其他列
  • 使用 lambda 函數減去日期的最小值並使用 dt.days 查找天數
  • 然后刪除額外的日期列
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
     Type                  Date  diff
0     Car  2019-01-06T21:44:09Z     1
1   Train  2019-01-06T19:44:09Z     4
2   Train  2019-01-02T19:44:09Z     0
3     Car  2019-01-08T06:44:09Z     3
4     Car  2019-01-06T18:44:09Z     1
5   Train  2019-01-04T19:44:09Z     2
6     Car  2019-01-05T16:34:09Z     0
7   Train  2019-01-08T19:44:09Z     6
8     Car  2019-01-07T14:44:09Z     2
9     Car  2019-01-06T11:44:09Z     1
10  Train  2019-01-10T19:44:09Z     8

直接從transform中減去

s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days

Out[36]:
     Type                  Date  diff
0     Car  2019-01-06T21:44:09Z     1
1   Train  2019-01-06T19:44:09Z     4
2   Train  2019-01-02T19:44:09Z     0
3     Car  2019-01-08T06:44:09Z     3
4     Car  2019-01-06T18:44:09Z     1
5   Train  2019-01-04T19:44:09Z     2
6     Car  2019-01-05T16:34:09Z     0
7   Train  2019-01-08T19:44:09Z     6
8     Car  2019-01-07T14:44:09Z     2
9     Car  2019-01-06T11:44:09Z     1
10  Train  2019-01-10T19:44:09Z     8

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM