![](/img/trans.png)
[英]compare multiple columns to get rows that are different in two pandas dataframe
[英]Compare two values for different rows in Pandas dataframe
我有一個按id
和sub_id
分組的具有不同提交時間的提交記錄數據集。 id
下會有多個sub_id
不同的提交,表示它們是原始事件的子事件。 例如:
id sub_id submission_time valuation_time amend_time
G1 Original 2021-05-13T00:11:05Z 2021-05-13T00:12:05Z
G1 Valuation 2021-05-13T06:11:05Z 2021-05-13T06:12:10Z
G1 Amend 2021-05-14T08:09:01Z 2021-05-14T09:09:05Z 2021-05-18T19:19:15Z
G2 Original 2021-04-12T00:11:05Z 2021-04-12T00:12:05Z
G2 Valuation 2021-04-12T06:11:05Z 2021-04-12T06:12:10Z
...
我想通過數據集valuation_time
並檢查sub_id == "Valuation"
的評估時間是否在同一id
參考下sub_id == "Original"
的submission_time
時間之后。 如果這是真的,我想輸入一個新列並填充sub_id == "Valuation"
為pass
,否則為fail
。
我非常感謝您在這方面的幫助,因為我對這個挑戰一無所知。 太感謝了。
請試試這個
import datetime
df=pd.read_excel('C:\MyCodes\samplepython.xlsx')
df['Status']=''
df_new=pd.DataFrame()
for index, row in df.iterrows():
sub_time = datetime.datetime.strptime(row['submission_time'], "%Y-%m-
%dT%H:%M:%SZ")
val_time = datetime.datetime.strptime(row['valuation_time'], "%Y-%m-
%dT%H:%M:%SZ")
if row['sub_id']=='Valuation' and val_time>sub_time:
row['Status']='Pass'
elif row['sub_id']=='Valuation' and val_time<=sub_time:
row['Status']='Fail'
df_new=df_new.append(row)
代碼:
import datetime
import pandas as pd
list_values=[['G1','Original',datetime.datetime.strptime('2021-05-13T00:11:05Z', "%Y-%m-%dT%H:%M:%SZ"),datetime.datetime.strptime('2021-05-13T00:12:05Z', "%Y-%m-%dT%H:%M:%SZ")],
[< please load other values>],
['G2','Valuation',datetime.datetime.strptime('2021-04-12T06:11:05Z', "%Y-%m-%dT%H:%M:%SZ"),datetime.datetime.strptime('2021-04-12T06:12:10Z', "%Y-%m-%dT%H:%M:%SZ")]]
df=pd.DataFrame(list_values,columns = ['id', 'sub_id',
'submission_time', 'valuation_time'])
df.sort_values(by=['id', 'sub_id'])
status=[]
level=0
for index,row in df.iterrows():
if level==0 and row['sub_id']=='Original':
sub_time=row['submission_time']
status.append('')
level+=1
elif level==1 and row['sub_id']=='Valuation':
val_time=row['valuation_time']
if sub_time>val_time:
status.append('Fail')
else:
status.append('Pass')
level=0
else:
level=0
status.append('')
df["Status"]=status
print(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.