简体   繁体   English

比较两个不同大小的数据帧的各种(但不是全部)列,并从一个数据帧中仅选择条件为真的那些行

[英]Comparing various (but not all) columns of two different sized dataframes and select only those rows from one dataframe where the conditions are true

I have two datframes that have different numbers of rows and well as different numbers of columns.我有两个具有不同行数和不同列数的数据框。

row_List1:行列表1:

        date   team_home  team_away   goals_home   goals_away   shootout_win   competition

1 2018-06-04 India Kenya 3 0 NaN Friendly 2018
2 2018-06-06 Armenia Moldova 0 0 NaN Friendly 2018
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018

row_List2: row_List2:

date team_home team_away goals_home goals_away shootout_win competition venue

1 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
2 2018-06-05 USA Pakistan 8 5 NaN Friendly 2018 Nuetral
3 2018-06-06 Moldova Armenia 0 0 NaN Friendly 2018 Away
4 2018-06-07 India Srilanka 2 0 NaN Friendly 2018 Home
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018 Away
6 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home

So row_List2 has more columns and more rows than row_List1.所以 row_List2 比 row_List1 有更多的列和更多的行。

row_List2 has venues of all matches. row_List2 有所有比赛的场地。 I need the to add a column venues in row_List1 and check for a match in row_List1, if it exists in row_List2, I need to extract the venue and add to the new column in row_List1.我需要在 row_List1 中添加一列场地并检查 row_List1 中的匹配项,如果它存在于 row_List2 中,我需要提取场地并添加到 row_List1 中的新列。

I tried the below code:我尝试了以下代码:

# row_list1['venue'] = np.where(((row_list1['date'] == row_list2['date']) and (row_list1['team_home'] == row_list2['team_home'] or row_list1['team_home'] == row_list2['team_away']) and (row_list1['team_away'] == row_list2['team_away'] or row_list1['team_away'] == row_list2['team_home']) and (row_list1['goals_home'] == row_list2['goals_home'] or row_list1['goals_home'] == row_list2['goals_away']) and (row_list1['goals_away'] == row_list2['goals_away'] or row_list1['goals_away'] == row_list2['goals_home'])), row_list2['venue'], np.NaN)

These are the conditions I need but the above code gives me an error:这些是我需要的条件,但上面的代码给了我一个错误:

ValueError: Can only compare identically-labeled Series objects

Now one more problem is that the team_home and team_away may be switched in row_List2.现在还有一个问题是team_home 和team_away 可能在row_List2 中切换。 So I need to check:所以我需要检查:

if row_list1['team_home'] == row_list2['team_home'] or row_list1['team_home'] == row_list2['team_away']) and (row_list1['team_away'] == row_list2['team_away'] or row_list1['team_away'] == row_list2['team_home']) and (row_list1['goals_home'] == row_list2['goals_home'] or row_list1['goals_home'] == row_list2['goals_away']) and (row_list1['goals_away'] == row_list2['goals_away'] or row_list1['goals_away'] == row_list2['goals_home'])如果 row_list1['team_home'] == row_list2['team_home'] 或 row_list1['team_home'] == row_list2['team_away']) 和 (row_list1['team_away'] == row_list2['team_away'] 或 row_list1[ 'team_away'] == row_list2['team_home']) 和 (row_list1['goals_home'] == row_list2['goals_home'] 或 row_list1['goals_home'] == row_list2['goals_away']) 和 (row_list1['目标距离'] == row_list2['goals_away'] 或 row_list1['goals_away'] == row_list2['goals_home'])

What I want as an output is:我想要的输出是:

row_List1:行列表1:

        date   team_home  team_away   goals_home   goals_away   shootout_win   competition     venue

1 2018-06-04 India Kenya 3 0 NaN Friendly 2018 Home
2 2018-06-06 Armenia Moldova 0 0 NaN Friendly 2018 Away
3 2018-06-09 Italy Netherlands 1 1 NaN Friendly 2018 Away

Can Anyone please help?有人可以帮忙吗?

This is kind of hacky but it works.这有点骇人听闻,但它确实有效。 Note that the Armenia-Moldova games don't actually match in your dataframes (they're flipped home/away).请注意,Armenia-Moldova 游戏在您的数据框中实际上并不匹配(它们被翻转回家/离开)。 I had to .fillna() before performing the comparison because np.nan doesn't == np.nan .在执行比较之前我必须.fillna()因为np.nan不 == np.nan

>>> for df in [df1, df2]:
...    df.fillna(0, inplace=True)

>>> df1[[df2.drop('venue', axis=1).eq(r).all(axis=1).any() for r in df1.itertuples(index=False)]]

    date    team_home   team_away   goals_home  goals_away  shootout_win    competition year
0   2018-06-04  India   Kenya   3   0   0.0 Friendly    2018
2   2018-06-09  Italy   Netherlands 1   1   0.0 Friendly    2018

Is this what you are looking for?这是你想要的?

df = pd.merge(row_List1,row_List2.drop_duplicates(),how = 'left')

Output:输出:

       date team_home    team_away  goals_home  goals_away  shootout_win  \
0  6/4/2018     India        Kenya           3           0           NaN   
1  6/6/2018   Armenia      Moldova           0           0           NaN   
2  6/9/2018     Italy  Netherlands           1           1           NaN   

  competition  year venue  
0    Friendly  2018  Home  
1    Friendly  2018   NaN  
2    Friendly  2018  Away

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Select 仅来自 Dataframe 的那些行,其中某些带有后缀的列的值不等于零 - Select only those rows from a Dataframe where certain columns with suffix have values not equal to zero 比较具有相同列和不同行的两个数据框 - Comparing two dataframes with same columns and different rows 使用一个 dataframe 行连接两个不同数据帧的列(熊猫) - Use one dataframe rows to connect the columns of two different dataframes (Pandas) 满足两个字符串条件之一的 pandas dataframe 中的 Select 行 - Select rows from a pandas dataframe that meets one of two string conditions 根据特定条件将来自两个单独的 pandas 数据帧的行附加到一个 dataframe - Appending rows from two separate pandas dataframes onto one dataframe based on certain conditions 将两个不同大小的数据框除以所有选项 - divide two different sized dataframes by all options 使用pandas从数据框中使用两个不同的列来选择行? - Using pandas to select rows using two different columns from dataframe? 在 python 中将两个不同大小的数据帧合并为一个 - Merge two different sized dataframes into one in python 在条件中合并来自两个不同数据框的两列,python - merge two columns from two different dataframes in conditions, python 如何一次读取熊猫数据框的两行两列,并在这些行/列值上应用条件? - How to read two rows and two columns of a pandas dataframe at once and apply conditions on those rows/column values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM