[英]Create new pandas dataframe based on a condition on Start Date and End Date Column in another pandas
I have start datetime and end date time columns in pandas dataframe as shown below.我在 pandas dataframe 中有开始日期时间和结束日期时间列,如下所示。
If the End date and time are changing to another day or another hour I need to create a new row with the start time as starting time as the next hour and end time as the end of that hour (if original end time > hour) or equal to end time in original data (if original end time < current hour) and soon.如果结束日期和时间更改为另一天或另一个小时,我需要创建一个新行,其中开始时间为下一小时的开始时间,结束时间为该小时的结束时间(如果原始结束时间>小时)或等于原始数据中的结束时间(如果原始结束时间 < 当前时间)并且很快。 The resultant expected table is shown below.
生成的预期表如下所示。
Is this possible with Pandas as my data is in a dataframe这对 Pandas 是否可行,因为我的数据在 dataframe 中
Compute the hour difference between Start_Time
and End_Time
( call it length
), then repeat each row by length
times using df.reindex(df.reindex.repeat(...))
.计算
Start_Time
和End_Time
之间的小时差(称为length
),然后使用df.reindex(df.reindex.repeat(...))
将每一行重复length
次。 Then assign a counter from 0
to length-1
for the rows, separately in the each group created by the starting date.然后在开始日期创建的每个组中分别为行分配一个从
0
到length-1
的计数器。
Then for Start_Time
, wherever counter is not zero(that is this is not the starting row for that date), round off the time to hh:00:00
and increment hour
by counter.然后对于
Start_Time
,只要计数器不为零(即这不是该日期的起始行),将时间四舍五入到hh:00:00
并按计数器递增hour
。
For End_Time
, wherever counter is not equal to length-1
(that is this is not the last row for that date), set End_Time
as Start_Time
but minute and second reset to 59 ie in the format: hh:59:59
where hour is from Start_Time
.对于
End_Time
,只要 counter 不等于length-1
(即这不是该日期的最后一行),请将End_Time
设置为Start_Time
但分钟和秒重置为 59 即格式: hh:59:59
其中小时是从Start_Time
。
Use:利用:
df = (pd.DataFrame({
'Start_Time': ['2019-08-29 17:29:29',
'2019-09-04 17:29:25', '2019-09-25 10:16:32'],
'End_Time': ['2019-08-29 17:32:18',
'2019-09-04 18:14:41', '2019-09-26 13:01:26']}))
df.Start_Time = pd.to_datetime(df.Start_Time)
df.End_Time = pd.to_datetime(df.End_Time)
timeDiff = df.End_Time.dt.floor(freq = 'H') - df.Start_Time.dt.floor(freq = 'H')
df['length'] = (timeDiff.dt.days * 24 + timeDiff.dt.seconds//3600 + 1)
df = df.reindex(df.index.repeat(df['length'])).reset_index(drop = True)
df['counter'] = (df.groupby(df.Start_Time.dt.date)['length']
.transform(lambda x: np.arange(x.iloc[0])))
mask = df.counter.eq(0)
(df.Start_Time.where(mask, df.Start_Time.dt.round('H') +
pd.to_timedelta(df.counter, unit = 'h'), inplace = True))
mask = df.length.eq(df.counter + 1)
masked_val = ((pd.to_timedelta(1, unit = 'h') +
df.Start_Time.dt.floor(freq = 'H'))
.dt.ceil(freq = 'H') + pd.to_timedelta(-1, unit = 'S'))
df.End_Time.where(mask, masked_val, inplace = True)
df.drop(columns = df.columns[2:], axis = 1, inplace = True)
Output: Output:
>>> df
Start_Time End_Time
0 2019-08-29 17:29:29 2019-08-29 17:32:18
1 2019-09-04 17:29:25 2019-09-04 17:59:59
2 2019-09-04 18:00:00 2019-09-04 18:14:41
3 2019-09-25 10:16:32 2019-09-25 10:59:59
4 2019-09-25 11:00:00 2019-09-25 11:59:59
5 2019-09-25 12:00:00 2019-09-25 12:59:59
...
28 2019-09-26 11:00:00 2019-09-26 11:59:59
29 2019-09-26 12:00:00 2019-09-26 12:59:59
30 2019-09-26 13:00:00 2019-09-26 13:01:26
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.