I am learning Python with Pandas and trying to work out the most efficient way to compare multiple selected columns on 2 dataframes to find a match. For example, if I have the following two dataframes:
Frame 1
A B C D E F
001 10 0 0 10 0 10
Frame 2
A B C D E F
200 10 0 10 0 10 0
201 0 10 10 0 0 10
202 0 10 0 0 0 0
203 0 0 0 10 0 10
I'm looking for a way to compare columns A
, B
, C
, D
in the 2 dataframes in order to drop rows which do not match 10
in any column.
In this case, I would expect it to drop rows 201
and 202
because there are no matches, where row 200
and 203
had 1 match (even though row 200
has 1 column that does not match).
I've tried looping through all the rows in Frame 2, compare
letters = ['A', 'B', 'C', 'D']
for ix, row in frame_2():
for letter in letters:
if frame_1[letter].values[0] != frame_2.loc[ix, letter]:
frame_2.drop(ix, inplace=True)
break
This removed some rows but not all.
Is there an efficient way to loop through all the rows and check if there's a single match in any of the columns of another dataframe?
Thanks in advance for the help!
I think simpliest solution is replace non 10
to one value in df1
and another value in df2
, compare each column with isin
for possible compare more values if df1
has more rows, create boolean
DataFrame, concat
and filter by any
for test at least one True
per row:
letters = ['A', 'B', 'C', 'D']
out = []
for letter in letters:
m = df2[letter].mask(lambda x: x!=10, 0).isin(df1[letter].mask(lambda x: x!=10, 1))
out.append(m)
df = df2[pd.concat(out, axis=1).any(axis=1)]
Alternative solution:
df = df2[np.logical_or.reduce(out)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.