简体   繁体   English

根据另一个 pandas 中的开始日期和结束日期列的条件创建新的 pandas dataframe

[英]Create new pandas dataframe based on a condition on Start Date and End Date Column in another pandas

I have start datetime and end date time columns in pandas dataframe as shown below.我在 pandas dataframe 中有开始日期时间和结束日期时间列,如下所示。

在此处输入图像描述

If the End date and time are changing to another day or another hour I need to create a new row with the start time as starting time as the next hour and end time as the end of that hour (if original end time > hour) or equal to end time in original data (if original end time < current hour) and soon.如果结束日期和时间更改为另一天或另一个小时,我需要创建一个新行,其中开始时间为下一小时的开始时间,结束时间为该小时的结束时间(如果原始结束时间>小时)或等于原始数据中的结束时间(如果原始结束时间 < 当前时间)并且很快。 The resultant expected table is shown below.生成的预期表如下所示。

在此处输入图像描述

Is this possible with Pandas as my data is in a dataframe这对 Pandas 是否可行,因为我的数据在 dataframe 中

Compute the hour difference between Start_Time and End_Time ( call it length ), then repeat each row by length times using df.reindex(df.reindex.repeat(...)) .计算Start_TimeEnd_Time之间的小时差(称为length ),然后使用df.reindex(df.reindex.repeat(...))将每一行重复length次。 Then assign a counter from 0 to length-1 for the rows, separately in the each group created by the starting date.然后在开始日期创建的每个组中分别为行分配一个从0length-1的计数器。

Then for Start_Time , wherever counter is not zero(that is this is not the starting row for that date), round off the time to hh:00:00 and increment hour by counter.然后对于Start_Time ,只要计数器不为零(即这不是该日期的起始行),将时间四舍五入到hh:00:00并按计数器递增hour

For End_Time , wherever counter is not equal to length-1 (that is this is not the last row for that date), set End_Time as Start_Time but minute and second reset to 59 ie in the format: hh:59:59 where hour is from Start_Time .对于End_Time ,只要 counter 不等于length-1 (即这不是该日期的最后一行),请将End_Time设置为Start_Time但分钟和秒重置为 59 即格式: hh:59:59其中小时是从Start_Time

Use:利用:

df = (pd.DataFrame({
        'Start_Time': ['2019-08-29 17:29:29', 
              '2019-09-04 17:29:25', '2019-09-25 10:16:32'], 
        'End_Time': ['2019-08-29 17:32:18', 
              '2019-09-04 18:14:41', '2019-09-26 13:01:26']}))
df.Start_Time = pd.to_datetime(df.Start_Time)
df.End_Time = pd.to_datetime(df.End_Time)
timeDiff = df.End_Time.dt.floor(freq = 'H') - df.Start_Time.dt.floor(freq = 'H')

df['length'] = (timeDiff.dt.days * 24 + timeDiff.dt.seconds//3600 + 1)

df = df.reindex(df.index.repeat(df['length'])).reset_index(drop = True)
df['counter'] = (df.groupby(df.Start_Time.dt.date)['length']
                        .transform(lambda x: np.arange(x.iloc[0])))

mask = df.counter.eq(0)
(df.Start_Time.where(mask, df.Start_Time.dt.round('H') + 
              pd.to_timedelta(df.counter, unit = 'h'), inplace = True))


mask = df.length.eq(df.counter + 1)
masked_val = ((pd.to_timedelta(1, unit = 'h') + 
                df.Start_Time.dt.floor(freq = 'H'))
              .dt.ceil(freq = 'H') + pd.to_timedelta(-1, unit = 'S'))    

df.End_Time.where(mask, masked_val, inplace = True)
df.drop(columns = df.columns[2:], axis = 1, inplace = True)

Output: Output:

>>> df
              Start_Time            End_Time
0  2019-08-29 17:29:29 2019-08-29 17:32:18
1  2019-09-04 17:29:25 2019-09-04 17:59:59
2  2019-09-04 18:00:00 2019-09-04 18:14:41
3  2019-09-25 10:16:32 2019-09-25 10:59:59
4  2019-09-25 11:00:00 2019-09-25 11:59:59
5  2019-09-25 12:00:00 2019-09-25 12:59:59
...
28 2019-09-26 11:00:00 2019-09-26 11:59:59
29 2019-09-26 12:00:00 2019-09-26 12:59:59
30 2019-09-26 13:00:00 2019-09-26 13:01:26

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 Pandas dataframe 中的日期值和条件创建新列 - How to create a new column based on Date Values & Condition in Pandas dataframe Pandas 数据框根据另一列的条件创建新行 - Pandas dataframe create new rows based on condition from another column Pandas 根据来自另一个 dataframe 的计数和条件创建新列 - Pandas Create new column based on a count and a condition from another dataframe 根据日期序列在 pandas DataFrame 中创建新列 - Create new column in pandas DataFrame based in date sequences Groupby 并根据条件计数并从 Pandas Dataframe 中的日期列获取周开始(从星期日)和周结束? - Groupby and count with condition and get week start (from sunday) and week end from date column in Pandas Dataframe? 熊猫矢量化根据日期分配列值,给定另一个具有值和开始日期的数据框 - Pandas vectorization to assign column value based on date, given another dataframe with value and start date 这段代码根据开始和结束日期在我的 Pandas 数据框中生成新的日期行有什么问题? - What is wrong with this code to generate new date rows in my Pandas dataframe based on start and end dates? 如何根据另一列的日期条件获取熊猫数据框中特定列的值? - How do I get the values of a particular column in a pandas dataframe based on a date condition on another column? 检查特定列是否大于另一列并根据 pandas dataframe 中的条件创建新列 - Check if specific column is greater than another column and create a new column based on condition in pandas dataframe 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM