簡體   English   中英

比較兩個不同大小的數據幀的各種(但不是全部)列,並從一個數據幀中僅選擇條件為真的那些行

[英]Comparing various (but not all) columns of two different sized dataframes and select only those rows from one dataframe where the conditions are true

我有兩個具有不同行數和不同列數的數據框。

行列表1:

        date   team_home  team_away   goals_home   goals_away   shootout_win   competition

1 2018-06-04 India Kenya 3 0 NaN Friendly 2018
2 2018-06-06 Armenia Moldova 0 0 NaN Friendly 2018
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018

row_List2:

date team_home team_away goals_home goals_away shootout_win competition venue

1 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
2 2018-06-05 USA Pakistan 8 5 NaN Friendly 2018 Nuetral
3 2018-06-06 Moldova Armenia 0 0 NaN Friendly 2018 Away
4 2018-06-07 India Srilanka 2 0 NaN Friendly 2018 Home
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018 Away
6 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home

所以 row_List2 比 row_List1 有更多的列和更多的行。

row_List2 有所有比賽的場地。 我需要在 row_List1 中添加一列場地並檢查 row_List1 中的匹配項,如果它存在於 row_List2 中,我需要提取場地並添加到 row_List1 中的新列。

我嘗試了以下代碼:

# row_list1['venue'] = np.where(((row_list1['date'] == row_list2['date']) and (row_list1['team_home'] == row_list2['team_home'] or row_list1['team_home'] == row_list2['team_away']) and (row_list1['team_away'] == row_list2['team_away'] or row_list1['team_away'] == row_list2['team_home']) and (row_list1['goals_home'] == row_list2['goals_home'] or row_list1['goals_home'] == row_list2['goals_away']) and (row_list1['goals_away'] == row_list2['goals_away'] or row_list1['goals_away'] == row_list2['goals_home'])), row_list2['venue'], np.NaN)

這些是我需要的條件,但上面的代碼給了我一個錯誤:

ValueError: Can only compare identically-labeled Series objects

現在還有一個問題是team_home 和team_away 可能在row_List2 中切換。 所以我需要檢查:

如果 row_list1['team_home'] == row_list2['team_home'] 或 row_list1['team_home'] == row_list2['team_away']) 和 (row_list1['team_away'] == row_list2['team_away'] 或 row_list1[ 'team_away'] == row_list2['team_home']) 和 (row_list1['goals_home'] == row_list2['goals_home'] 或 row_list1['goals_home'] == row_list2['goals_away']) 和 (row_list1['目標距離'] == row_list2['goals_away'] 或 row_list1['goals_away'] == row_list2['goals_home'])

我想要的輸出是:

行列表1:

        date   team_home  team_away   goals_home   goals_away   shootout_win   competition     venue

1 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
2 2018-06-06 Armenia Moldova 0 0 NaN Friendly 2018 Away
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018 Away

有人可以幫忙嗎?

這有點駭人聽聞,但它確實有效。 請注意,Armenia-Moldova 游戲在您的數據框中實際上並不匹配(它們被翻轉回家/離開)。 在執行比較之前我必須.fillna()因為np.nan不 == np.nan

>>> for df in [df1, df2]:
...    df.fillna(0, inplace=True)

>>> df1[[df2.drop('venue', axis=1).eq(r).all(axis=1).any() for r in df1.itertuples(index=False)]]

    date    team_home   team_away   goals_home  goals_away  shootout_win    competition year
0   2018-06-04  India   Kenya   3   0   0.0 Friendly    2018
2   2018-06-09  Italy   Netherlands 1   1   0.0 Friendly    2018

這是你想要的?

df = pd.merge(row_List1,row_List2.drop_duplicates(),how = 'left')

輸出:

       date team_home    team_away  goals_home  goals_away  shootout_win  \
0  6/4/2018     India        Kenya           3           0           NaN   
1  6/6/2018   Armenia      Moldova           0           0           NaN   
2  6/9/2018     Italy  Netherlands           1           1           NaN   

  competition  year venue  
0    Friendly  2018  Home  
1    Friendly  2018   NaN  
2    Friendly  2018  Away

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM