Python/Pandas: Compare multiple columns in two dataframes and remove row if no matches found

Question

I am learning Python with Pandas and trying to work out the most efficient way to compare multiple selected columns on 2 dataframes to find a match. For example, if I have the following two dataframes:

Frame 1
      A     B    C    D    E    F    
001   10    0    0    10   0    10


Frame 2
      A     B    C    D    E    F
200   10    0    10   0    10   0
201   0     10   10   0    0    10
202   0     10   0    0    0    0
203   0     0    0    10   0    10

I'm looking for a way to compare columns A , B , C , D in the 2 dataframes in order to drop rows which do not match 10 in any column.

In this case, I would expect it to drop rows 201 and 202 because there are no matches, where row 200 and 203 had 1 match (even though row 200 has 1 column that does not match).

I've tried looping through all the rows in Frame 2, compare

letters = ['A', 'B', 'C', 'D']

for ix, row in frame_2():
    for letter in letters:
        if frame_1[letter].values[0] != frame_2.loc[ix, letter]:
            frame_2.drop(ix, inplace=True)
            break

This removed some rows but not all.

Is there an efficient way to loop through all the rows and check if there's a single match in any of the columns of another dataframe?

Thanks in advance for the help!

Answer 1

I think simpliest solution is replace non 10 to one value in df1 and another value in df2 , compare each column with isin for possible compare more values if df1 has more rows, create boolean DataFrame, concat and filter by any for test at least one True per row:

letters = ['A', 'B', 'C', 'D']

out = []
for letter in letters:
    m = df2[letter].mask(lambda x: x!=10, 0).isin(df1[letter].mask(lambda x: x!=10, 1))
    out.append(m)

df = df2[pd.concat(out, axis=1).any(axis=1)]

Alternative solution:

df = df2[np.logical_or.reduce(out)]

Python/Pandas: Compare multiple columns in two dataframes and remove row if no matches found

Question

1 answers

solution1
2 ACCPTED 2019-09-15 12:04:19

Python/Pandas: Compare multiple columns in two dataframes and remove row if no matches found

Question

1 answers

solution1 2 ACCPTED 2019-09-15 12:04:19

solution1
2 ACCPTED 2019-09-15 12:04:19