Python / Pandas：比较两个数据框中的多列，如果找不到匹配项，则删除行

Question

I am learning Python with Pandas and trying to work out the most efficient way to compare multiple selected columns on 2 dataframes to find a match. 我正在使用Pandas学习Python，并尝试找出最有效的方法来比较2个数据帧上的多个选定列以找到匹配项。 For example, if I have the following two dataframes: 例如，如果我有以下两个数据框：

Frame 1
      A     B    C    D    E    F    
001   10    0    0    10   0    10


Frame 2
      A     B    C    D    E    F
200   10    0    10   0    10   0
201   0     10   10   0    0    10
202   0     10   0    0    0    0
203   0     0    0    10   0    10

I'm looking for a way to compare columns A , B , C , D in the 2 dataframes in order to drop rows which do not match 10 in any column. 我正在寻找一种比较2个数据框中的A ， B ， C ， D列A ，以便删除在任何列中都不匹配10的行。

In this case, I would expect it to drop rows 201 and 202 because there are no matches, where row 200 and 203 had 1 match (even though row 200 has 1 column that does not match). 在这种情况下，我希望它删除第201和202行，因为没有匹配项，第200和203行有1个匹配项（即使第200行有1个不匹配的列）。

I've tried looping through all the rows in Frame 2, compare 我尝试遍历第2帧中的所有行，比较

letters = ['A', 'B', 'C', 'D']

for ix, row in frame_2():
    for letter in letters:
        if frame_1[letter].values[0] != frame_2.loc[ix, letter]:
            frame_2.drop(ix, inplace=True)
            break

This removed some rows but not all. 这删除了一些行，但不是全部。

Is there an efficient way to loop through all the rows and check if there's a single match in any of the columns of another dataframe? 有没有一种有效的方法可以遍历所有行并检查另一个数据框的任何列中是否有单个匹配项？

Thanks in advance for the help! 先谢谢您的帮助！

Answer 1

I think simpliest solution is replace non 10 to one value in df1 and another value in df2 , compare each column with isin for possible compare more values if df1 has more rows, create boolean DataFrame, concat and filter by any for test at least one True per row: 我认为最简单的解决方案是将df1一个非10值替换为df2另一个值，将每列与isin进行比较，以便在df1具有更多行的情况下比较更多值，创建boolean DataFrame， concat并按any进行过滤以测试至少一个True每行：

letters = ['A', 'B', 'C', 'D']

out = []
for letter in letters:
    m = df2[letter].mask(lambda x: x!=10, 0).isin(df1[letter].mask(lambda x: x!=10, 1))
    out.append(m)

df = df2[pd.concat(out, axis=1).any(axis=1)]

Alternative solution: 替代解决方案：

df = df2[np.logical_or.reduce(out)]

Python / Pandas：比较两个数据框中的多列，如果找不到匹配项，则删除行

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-09-15 12:04:19

Python / Pandas：比较两个数据框中的多列，如果找不到匹配项，则删除行

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-09-15 12:04:19

解决方案1
2 已采纳 2019-09-15 12:04:19