根据另一个 pandas 中的开始日期和结束日期列的条件创建新的 pandas dataframe

Question

I have start datetime and end date time columns in pandas dataframe as shown below.我在 pandas dataframe 中有开始日期时间和结束日期时间列，如下所示。

If the End date and time are changing to another day or another hour I need to create a new row with the start time as starting time as the next hour and end time as the end of that hour (if original end time > hour) or equal to end time in original data (if original end time < current hour) and soon.如果结束日期和时间更改为另一天或另一个小时，我需要创建一个新行，其中开始时间为下一小时的开始时间，结束时间为该小时的结束时间（如果原始结束时间>小时）或等于原始数据中的结束时间（如果原始结束时间 < 当前时间）并且很快。 The resultant expected table is shown below.生成的预期表如下所示。

Is this possible with Pandas as my data is in a dataframe这对 Pandas 是否可行，因为我的数据在 dataframe 中

Answer 1

Compute the hour difference between Start_Time and End_Time ( call it length ), then repeat each row by length times using df.reindex(df.reindex.repeat(...)) .计算Start_Time和End_Time之间的小时差（称为length ），然后使用df.reindex(df.reindex.repeat(...))将每一行重复length次。 Then assign a counter from 0 to length-1 for the rows, separately in the each group created by the starting date.然后在开始日期创建的每个组中分别为行分配一个从0到length-1的计数器。

Then for Start_Time , wherever counter is not zero(that is this is not the starting row for that date), round off the time to hh:00:00 and increment hour by counter.然后对于Start_Time ，只要计数器不为零（即这不是该日期的起始行），将时间四舍五入到hh:00:00并按计数器递增hour 。

For End_Time , wherever counter is not equal to length-1 (that is this is not the last row for that date), set End_Time as Start_Time but minute and second reset to 59 ie in the format: hh:59:59 where hour is from Start_Time .对于End_Time ，只要 counter 不等于length-1 （即这不是该日期的最后一行），请将End_Time设置为Start_Time但分钟和秒重置为 59 即格式： hh:59:59其中小时是从Start_Time 。

Use:利用：

df = (pd.DataFrame({
        'Start_Time': ['2019-08-29 17:29:29', 
              '2019-09-04 17:29:25', '2019-09-25 10:16:32'], 
        'End_Time': ['2019-08-29 17:32:18', 
              '2019-09-04 18:14:41', '2019-09-26 13:01:26']}))
df.Start_Time = pd.to_datetime(df.Start_Time)
df.End_Time = pd.to_datetime(df.End_Time)
timeDiff = df.End_Time.dt.floor(freq = 'H') - df.Start_Time.dt.floor(freq = 'H')

df['length'] = (timeDiff.dt.days * 24 + timeDiff.dt.seconds//3600 + 1)

df = df.reindex(df.index.repeat(df['length'])).reset_index(drop = True)
df['counter'] = (df.groupby(df.Start_Time.dt.date)['length']
                        .transform(lambda x: np.arange(x.iloc[0])))

mask = df.counter.eq(0)
(df.Start_Time.where(mask, df.Start_Time.dt.round('H') + 
              pd.to_timedelta(df.counter, unit = 'h'), inplace = True))


mask = df.length.eq(df.counter + 1)
masked_val = ((pd.to_timedelta(1, unit = 'h') + 
                df.Start_Time.dt.floor(freq = 'H'))
              .dt.ceil(freq = 'H') + pd.to_timedelta(-1, unit = 'S'))    

df.End_Time.where(mask, masked_val, inplace = True)
df.drop(columns = df.columns[2:], axis = 1, inplace = True)

Output: Output：

>>> df
              Start_Time            End_Time
0  2019-08-29 17:29:29 2019-08-29 17:32:18
1  2019-09-04 17:29:25 2019-09-04 17:59:59
2  2019-09-04 18:00:00 2019-09-04 18:14:41
3  2019-09-25 10:16:32 2019-09-25 10:59:59
4  2019-09-25 11:00:00 2019-09-25 11:59:59
5  2019-09-25 12:00:00 2019-09-25 12:59:59
...
28 2019-09-26 11:00:00 2019-09-26 11:59:59
29 2019-09-26 12:00:00 2019-09-26 12:59:59
30 2019-09-26 13:00:00 2019-09-26 13:01:26

根据另一个 pandas 中的开始日期和结束日期列的条件创建新的 pandas dataframe

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-17 02:41:32

根据另一个 pandas 中的开始日期和结束日期列的条件创建新的 pandas dataframe

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-17 02:41:32

解决方案1
1 已采纳 2021-04-17 02:41:32