[英]Sum and groupby if date is between two dates in two other columns and create new groupby data frame - pandas
[英]check for date and time between two columns in pandas data frame
我有兩個數據框:
第一個日期框架是:
import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})
第二個數據幀是:
df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-1','2018-10-6','2018-10-2','2018-10-3','2018-10-5','2018-10-10'],
'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
})
我將根據序列號加入兩者:
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')
現在,我希望Date_y位於Date_x的3天內(從Date_x開始),這意味着Date_X +(1,2,3天)應為Date_y。 我可以如下所示,但我也想檢查我不知道如何實現的時間范圍
result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
我想檢查Time_df2是否在開始時間為Time_df1的6小時內。 請幫忙?
您可能在數據框中有一個合並日期和時間的列。 這是在數據框中合並一行的示例:
# Combining Date_x and time_df1
value_1_x = datetime.datetime.combine(result['Date_x'][0].date() ,\
datetime.datetime.strptime(result['Time_df1'][0], '%H:%M:%S').time())
# Combining date_y and time_df2
value_2_y = datetime.datetime.combine(result['Date_y'][0].date() , \
datetime.datetime.strptime(result['Time_df2'][0], '%H:%M:%S').time())
然后給定兩個日期時間對象,您可以簡單地減去以找到所需的差值:
difference = value_1_x - value_2_y
print(difference)
給出輸出:
4 days, 14:26:39
我的理解是,您希望查看是否在3天6個小時(或總共78個小時)之內。 您可以輕松地將其轉換為小時數,然后進行所需的比較:
hours_difference = abs(value_1_x - value_2_y).total_seconds() / 3600.0
print(hours_difference)
給出輸出:
110.44416666666666
希望有幫助!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.