[英]Define new column based on matching values between multiple columns in two dataframes
我目前正在嘗試為我正在構建的數據集定義 class label。 我需要查閱兩個不同的數據集,其中 df_port_call 是最終將包含 class label 的數據集。
The conditions in the if statements need to be satisfied for the row to receive a class label of 1. Basically, if a row exists in df_deficiency that matches the if statement conditions listed below, the Class column in df_port_call should get a label of 1.但我不確定如何對其進行矢量化,並且循環運行非常緩慢(大約需要 8 天才能終止)。 這里的任何幫助都會很棒!
df_port_call["Class"] = 0
for index, row in tqdm(df_port_call.iterrows()):
for index_def, row_def in df_deficiency.iterrows():
if row['MMSI'] == row_def['Primary VIN'] or row['IMO'] == row_def['Primary VIN'] or row['SHIP NAME'] == row_def['Vessel Name']:
if row_def['Inspection Date'] >= row['ARRIVAL IN USA (UTC)'] and row_def['Inspection Date'] <= row['DEPARTURE (UTC)']:
row['Class'] = 1
沒有輸入數據和預期結果,很難回答。 但是,您可以在np.where
中使用類似的東西:
df_port_call['Class'] = \
np.where(df_port_call['MMSI'].eq(df_deficiency['Primary VIN'])
| df_port_call['IMO'].eq(df_deficiency['Primary VIN'])
| df_port_call['SHIP NAME'].eq(df_deficiency['Vessel Name'])
& df_deficiency['Inspection Date'].between(df_port_call['ARRIVAL IN USA (UTC)'],
df_port_call['DEPARTURE (UTC)']),
1, 0)
適應您的代碼,但我認為這是正確的方法。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.