[英]Subtracting between rows of different columns in an indexed dataframe in python
I have an indexed dataframe (indexed by type then date) and would like to carry out a subtraction between the end time of the top row and start time of the next row in hours : 我有一个索引的数据帧(按类型索引,然后按日期索引),并希望在上一行的结束时间和下一行的开始时间(以小时为单位)之间进行减法运算:
type date start_time end_time code
A 01/01/2018 01/01/2018 9:00 01/01/2018 14:00 525
01/02/2018 01/02/2018 5:00 01/02/2018 17:00 524
01/04/2018 01/04/2018 8:00 01/04/2018 10:00 528
B 01/01/2018 01/01/2018 5:00 01/01/2018 14:00 525
01/04/2018 01/04/2018 2:00 01/04/2018 17:00 524
01/05/2018 01/05/2018 7:00 01/05/2018 10:00 528
I would like to get the resulting table with a new column['interval']: 我想用新列['interval']来获得结果表:
type date interval
A 01/01/2018 -
01/02/2018 15
01/04/2018 39
B 01/01/2018 -
01/04/2018 60
01/05/2018 14
The interval column is in hours 间隔栏以小时为单位
You can convert start_time
and end_time
to datetime format, then use apply
to subtract the end_time
of the previous row in each group (using groupby
). 您可以将
start_time
和end_time
转换为datetime格式,然后使用apply
减去每个组中前一行的end_time
(使用groupby
)。 To convert to hours, divide by pd.Timedelta('1 hour')
: 要转换为小时,请除以
pd.Timedelta('1 hour')
:
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
df['interval'] = (df.groupby(level=0,sort=False).apply(lambda x: x.start_time-x.end_time.shift(1)) / pd.Timedelta('1 hour')).values
>>> df
start_time end_time code interval
type date
A 01/01/2018 2018-01-01 09:00:00 2018-01-01 14:00:00 525 NaN
01/02/2018 2018-01-02 05:00:00 2018-01-02 17:00:00 524 15.0
01/04/2018 2018-01-04 08:00:00 2018-01-04 10:00:00 528 39.0
B 01/01/2018 2018-01-01 05:00:00 2018-01-01 14:00:00 525 NaN
01/04/2018 2018-01-04 02:00:00 2018-01-04 17:00:00 524 60.0
01/05/2018 2018-01-05 07:00:00 2018-01-05 10:00:00 528 14.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.