简体   繁体   English

计算连续日期列与熊猫另一列上的groupby之间的差异?

[英]Calculate difference between successive date column with groupby on another column in pandas?

I have a pandas dataframe,我有一个熊猫数据框,

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
                     ['Train','2019-01-06T19:44:09Z'],
                     ['Train','2019-01-02T19:44:09Z'],
                     ['Car','2019-01-08T06:44:09Z'],
                     ['Car','2019-01-06T18:44:09Z'],
                     ['Train','2019-01-04T19:44:09Z'],
                     ['Car','2019-01-05T16:34:09Z'],
                     ['Train','2019-01-08T19:44:09Z'],
                     ['Car','2019-01-07T14:44:09Z'],
                     ['Car','2019-01-06T11:44:09Z'],
                     ['Train','2019-01-10T19:44:09Z'],
                     ], 
                    columns=['Type', 'Date'])

Need to find the difference between successive dates for each type, after sorting them by dates在按日期排序后,需要找出每种类型的连续日期之间的差异

Final data looks like最终数据看起来像

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
                     ['Train','2019-01-06T19:44:09Z',4],
                     ['Train','2019-01-02T19:44:09Z',0],
                     ['Car','2019-01-08T06:44:09Z',3],
                     ['Car','2019-01-06T18:44:09Z',1],
                     ['Train','2019-01-04T19:44:09Z',2],
                     ['Car','2019-01-05T16:34:09Z',0],
                     ['Train','2019-01-08T19:44:09Z',6],
                     ['Car','2019-01-07T14:44:09Z',2],
                     ['Car','2019-01-06T11:44:09Z',1],
                     ['Train','2019-01-10T19:44:09Z',8],
                     ], 
                    columns=['Type', 'Date','diff'])

Here, Type Car min(Date) is 2019-01-05T16:34:09Z, so the diff starts as 0, then second date is 2019-01-06T18:44:09Z and 2019-01-06T11:44:09Z, so diff is 1 day (here not sure whether time can be included) and so on.. For Type Train min(Date) is 2019-01-02T19:44:09Z, so diff is 0 then 2019-01-04T19:44:09Z so 2 days diff在这里,Type Car min(Date) 是 2019-01-05T16:34:09Z,所以 diff 从 0 开始,然后第二个日期是 2019-01-06T18:44:09Z 和 2019-01-06T11:44:09Z,所以 diff 是 1 天(这里不确定是否可以包括时间)等等。对于 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 那么 2019-01-04T19:44 :09Z 所以 2 天的差异

I tried groupby, but not sure how to include sort on date我试过 groupby,但不确定如何包括日期排序

data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')

Use pandas.DataFrame.groupby with dt.date :pandas.DataFrame.groupbydt.date一起使用:

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())

Output:输出:

     Type                      Date   diff
0     Car 2019-01-06 21:44:09+00:00 1 days
1   Train 2019-01-06 19:44:09+00:00 4 days
2   Train 2019-01-02 19:44:09+00:00 0 days
3     Car 2019-01-08 06:44:09+00:00 3 days
4     Car 2019-01-06 18:44:09+00:00 1 days
5   Train 2019-01-04 19:44:09+00:00 2 days
6     Car 2019-01-05 16:34:09+00:00 0 days
7   Train 2019-01-08 19:44:09+00:00 6 days
8     Car 2019-01-07 14:44:09+00:00 2 days
9     Car 2019-01-06 11:44:09+00:00 1 days
10  Train 2019-01-10 19:44:09+00:00 8 days

If you want them to be int , add dt.days :如果您希望它们是int ,请添加dt.days

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days

Output:输出:

     Type                      Date  diff
0     Car 2019-01-06 21:44:09+00:00     1
1   Train 2019-01-06 19:44:09+00:00     4
2   Train 2019-01-02 19:44:09+00:00     0
3     Car 2019-01-08 06:44:09+00:00     3
4     Car 2019-01-06 18:44:09+00:00     1
5   Train 2019-01-04 19:44:09+00:00     2
6     Car 2019-01-05 16:34:09+00:00     0
7   Train 2019-01-08 19:44:09+00:00     6
8     Car 2019-01-07 14:44:09+00:00     2
9     Car 2019-01-06 11:44:09+00:00     1
10  Train 2019-01-10 19:44:09+00:00     8
  • first convert Date into date into some other column首先将日期转换为日期到其他列
  • use lambda function to subtract min of date and find days using dt.days使用 lambda 函数减去日期的最小值并使用 dt.days 查找天数
  • Then Drop the extra date column然后删除额外的日期列
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
     Type                  Date  diff
0     Car  2019-01-06T21:44:09Z     1
1   Train  2019-01-06T19:44:09Z     4
2   Train  2019-01-02T19:44:09Z     0
3     Car  2019-01-08T06:44:09Z     3
4     Car  2019-01-06T18:44:09Z     1
5   Train  2019-01-04T19:44:09Z     2
6     Car  2019-01-05T16:34:09Z     0
7   Train  2019-01-08T19:44:09Z     6
8     Car  2019-01-07T14:44:09Z     2
9     Car  2019-01-06T11:44:09Z     1
10  Train  2019-01-10T19:44:09Z     8

Direct subtraction from transform直接从transform中减去

s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days

Out[36]:
     Type                  Date  diff
0     Car  2019-01-06T21:44:09Z     1
1   Train  2019-01-06T19:44:09Z     4
2   Train  2019-01-02T19:44:09Z     0
3     Car  2019-01-08T06:44:09Z     3
4     Car  2019-01-06T18:44:09Z     1
5   Train  2019-01-04T19:44:09Z     2
6     Car  2019-01-05T16:34:09Z     0
7   Train  2019-01-08T19:44:09Z     6
8     Car  2019-01-07T14:44:09Z     2
9     Car  2019-01-06T11:44:09Z     1
10  Train  2019-01-10T19:44:09Z     8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM