[英]Calculating row-wise time difference in python
I want to calculate the travel time of each passengers in my data frame based on the difference between the moment where they first get in the bus and the moment they leave.我想根据他们第一次上车的时刻和他们离开的时刻之间的差异来计算我的数据框中每个乘客的旅行时间。
Here is the data frame这是数据框
my_df = pd.DataFrame({
'id': ['a', 'b', 'b', 'b', 'b', 'b', 'c','d'],
'date': ['2020/02/03', '2020/04/05', '2020/04/05', '2020/04/05','2020/04/06', '2020/04/06', '2020/12/15', '2020/06/23'],
'arriving_time': ['14:36:06', '08:52:02', '08:53:02', '08:55:24', '18:58:03', '19:03:05', '17:04:28', '21:31:23'],
'leaving_time': ['14:40:05', '08:52:41', '08:54:33', '08:57:14', '19:01:07', '19:04:08', '17:09:48', '21:50:12']
})
print(my_df)
output:
id date arriving_time leaving_time
0 a 2020/02/03 14:36:06 14:40:05
1 b 2020/04/05 08:52:02 08:52:41
2 b 2020/04/05 08:53:02 08:54:33
3 b 2020/04/05 08:55:24 08:57:14
4 b 2020/04/06 18:58:03 19:01:07
5 b 2020/04/06 19:03:05 19:04:08
6 c 2020/12/15 17:04:28 17:09:48
7 d 2020/06/23 21:31:23 21:50:12
However there is two problems (that I don't manage to solve myself):但是有两个问题(我无法自己解决):
Here is the result I want to obtain这是我想要获得的结果
id date arriving_time leaving_time travelTime
0 a 2020/02/03 14:36:06 14:40:05 00:03:59
1 b 2020/04/05 08:52:02 08:52:41 00:05:12
2 b 2020/04/05 08:53:02 08:54:33 00:05:12
3 b 2020/04/05 08:55:24 08:57:14 00:05:12
4 b 2020/04/06 18:58:03 19:01:07 00:06:05
5 b 2020/04/06 19:03:05 19:04:08 00:06:05
6 c 2020/12/15 17:04:28 17:09:48 00:05:20
7 d 2020/06/23 21:31:23 21:50:12 00:18:49
As you can see, passenger b made two different travel on the same day, and I want to know compute how long each one of them last.如您所见,乘客 b 在同一天进行了两次不同的旅行,我想知道计算每一次旅行的持续时间。
I already tried the following code, which seems to work, but it is really slow (which I think is due to the large amount of rows of my_df)我已经尝试了下面的代码,它似乎有效,但它真的很慢(我认为这是由于 my_df 的行数很大)
for user_id in set(my_df.id):
for day in set(my_df.loc[my_df.id == user_id, 'date']):
my_df.loc[(my_df.id == user_id) & (my_df.date == day), 'travelTime'] = max(my_df.loc[(my_df.id == user_id) & (my_df.date == day), 'leaving_time'].apply(pd.to_datetime)) - min(my_df.loc[(my_df.id == user_id) & (my_df.date == day), 'arriving_time'].apply(pd.to_datetime))
I think for correct maximal and minimal values are converted columns to datetimes and then subtract Series
created by GroupBy.transform
:我认为正确最大和最小值转换列,日期时间,然后减去
Series
创建由GroupBy.transform
:
my_df['s'] = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])
my_df['e'] = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])
g = my_df.groupby(['id', 'date'])
my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))
print (my_df)
id date arriving_time leaving_time s \
0 a 2020/02/03 14:36:06 14:40:05 2020-02-03 14:36:06
1 b 2020/04/05 08:52:02 08:52:41 2020-04-05 08:52:02
2 b 2020/04/05 08:53:02 08:54:33 2020-04-05 08:53:02
3 b 2020/04/05 08:55:24 08:57:14 2020-04-05 08:55:24
4 b 2020/04/06 18:58:03 19:01:07 2020-04-06 18:58:03
5 b 2020/04/06 19:03:05 19:04:08 2020-04-06 19:03:05
6 c 2020/12/15 17:04:28 17:09:48 2020-12-15 17:04:28
7 d 2020/06/23 21:31:23 21:50:12 2020-06-23 21:31:23
e travelTime
0 2020-02-03 14:40:05 00:03:59
1 2020-04-05 08:52:41 00:05:12
2 2020-04-05 08:54:33 00:05:12
3 2020-04-05 08:57:14 00:05:12
4 2020-04-06 19:01:07 00:06:05
5 2020-04-06 19:04:08 00:06:05
6 2020-12-15 17:09:48 00:05:20
7 2020-06-23 21:50:12 00:18:49
For avoid new columns is possible use DataFrame.assign
Series with datetimes
:为了避免新列,可以使用
DataFrame.assign
Series with datetimes
:
s = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])
e = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])
g = my_df.assign(s=s, e=e).groupby(['id', 'date'])
my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))
print (my_df)
id date arriving_time leaving_time travelTime
0 a 2020/02/03 14:36:06 14:40:05 00:03:59
1 b 2020/04/05 08:52:02 08:52:41 00:05:12
2 b 2020/04/05 08:53:02 08:54:33 00:05:12
3 b 2020/04/05 08:55:24 08:57:14 00:05:12
4 b 2020/04/06 18:58:03 19:01:07 00:06:05
5 b 2020/04/06 19:03:05 19:04:08 00:06:05
6 c 2020/12/15 17:04:28 17:09:48 00:05:20
7 d 2020/06/23 21:31:23 21:50:12 00:18:49
IIUC we first groupby
id
& date
to get the max and min leave & arrival time. IIUC我们首先
groupby
id
和date
,以获得最大和最小休假及到达时间。
then a simple subtraction.然后是一个简单的减法。
df2 = df.groupby(['id','date']).agg(min_arrival=('arriving_time','min'),
max_leave=('leaving_time','max'))
df2['travelTime'] = pd.to_datetime(df2['max_leave']) - pd.to_datetime(df2['min_arrival'])
print(df2)
min_arrival max_leave travelTime
id date
a 2020-02-03 14:36:06 14:40:05 00:03:59
b 2020-04-05 08:52:02 08:57:14 00:05:12
2020-04-06 18:58:03 19:04:08 00:06:05
c 2020-12-15 17:04:28 17:09:48 00:05:20
d 2020-06-23 21:31:23 21:50:12 00:18:49
if you want this back on yout original df, you could use transform
or merge the values from the new delta onto your original :如果您希望将其恢复到原始 df 上,您可以使用
transform
或将新增量中的值合并到您的原始值上:
df_new = (pd.merge(df,df2[['travelTime']],on=['date','id'],how='left')
id date arriving_time leaving_time travelTime
0 a 2020-02-03 14:36:06 14:40:05 00:03:59
1 b 2020-04-05 08:52:02 08:52:41 00:05:12
2 b 2020-04-05 08:53:02 08:54:33 00:05:12
3 b 2020-04-05 08:55:24 08:57:14 00:05:12
4 b 2020-04-06 18:58:03 19:01:07 00:06:05
5 b 2020-04-06 19:03:05 19:04:08 00:06:05
6 c 2020-12-15 17:04:28 17:09:48 00:05:20
7 d 2020-06-23 21:31:23 21:50:12 00:18:49
You could try this -你可以试试这个——
my_df['arriving_time'] = pd.to_datetime(my_df['arriving_time'])
my_df['leaving_time'] = pd.to_datetime(my_df['leaving_time'])
my_df['travel_time'] = my_df.groupby(['id', 'date'])['leaving_time'].transform('max') - my_df.groupby(['id', 'date'])['arriving_time'].transform('min')
my_df
id date arriving_time leaving_time travel_time
0 a 2020/02/03 2020-03-19 14:36:06 2020-03-19 14:40:05 00:03:59
1 b 2020/04/05 2020-03-19 08:52:02 2020-03-19 08:52:41 00:05:12
2 b 2020/04/05 2020-03-19 08:53:02 2020-03-19 08:54:33 00:05:12
3 b 2020/04/05 2020-03-19 08:55:24 2020-03-19 08:57:14 00:05:12
4 b 2020/04/06 2020-03-19 18:58:03 2020-03-19 19:01:07 00:06:05
5 b 2020/04/06 2020-03-19 19:03:05 2020-03-19 19:04:08 00:06:05
6 c 2020/12/15 2020-03-19 17:04:28 2020-03-19 17:09:48 00:05:20
7 d 2020/06/23 2020-03-19 21:31:23 2020-03-19 21:50:12 00:18:49
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.