根據兩個數據框中多列之間的匹配值定義新列

Question

我目前正在嘗試為我正在構建的數據集定義 class label。 我需要查閱兩個不同的數據集，其中 df_port_call 是最終將包含 class label 的數據集。

The conditions in the if statements need to be satisfied for the row to receive a class label of 1. Basically, if a row exists in df_deficiency that matches the if statement conditions listed below, the Class column in df_port_call should get a label of 1.但我不確定如何對其進行矢量化，並且循環運行非常緩慢（大約需要 8 天才能終止）。 這里的任何幫助都會很棒！

df_port_call["Class"] = 0

for index, row in tqdm(df_port_call.iterrows()):
    for index_def, row_def in df_deficiency.iterrows():
        if row['MMSI'] == row_def['Primary VIN'] or row['IMO'] == row_def['Primary VIN'] or row['SHIP NAME'] == row_def['Vessel Name']:
            if row_def['Inspection Date'] >= row['ARRIVAL IN USA (UTC)'] and row_def['Inspection Date'] <= row['DEPARTURE (UTC)']:
                row['Class'] = 1

Answer 1

沒有輸入數據和預期結果，很難回答。 但是，您可以在np.where中使用類似的東西：

df_port_call['Class'] = \
np.where(df_port_call['MMSI'].eq(df_deficiency['Primary VIN'])
         | df_port_call['IMO'].eq(df_deficiency['Primary VIN'])
         | df_port_call['SHIP NAME'].eq(df_deficiency['Vessel Name'])
         & df_deficiency['Inspection Date'].between(df_port_call['ARRIVAL IN USA (UTC)'],
                                                    df_port_call['DEPARTURE (UTC)']),
         1, 0)

適應您的代碼，但我認為這是正確的方法。

根據兩個數據框中多列之間的匹配值定義新列

問題描述

1 個解決方案

解決方案1
0 2022-01-01 22:20:17

根據兩個數據框中多列之間的匹配值定義新列

問題描述

1 個解決方案

解決方案1 0 2022-01-01 22:20:17

解決方案1
0 2022-01-01 22:20:17