简体   繁体   中英

Pandas compare two dataframes and remove what matches in one column

I have two separate pandas dataframes ( df1 and df2 ) which have multiple columns, but only one in common ('text').

I would like to do find every row in df2 that does not have a match in any of the rows of the column that df2 and df1 have in common.

df1

A    B    text
45   2    score
33   5    miss
20   1    score

df2

C    D    text
.5   2    shot
.3   2    shot
.3   1    miss

Result df (remove row containing miss since it occurs in df1)

C    D    text
.5   2    shot
.3   2    shot

Is it possible to use the isin method in this scenario?

As you asked, you can do this efficiently using isin (without resorting to expensive merge s).

>>> df2[~df2.text.isin(df1.text.values)]
C   D   text
0   0.5 2   shot
1   0.3 2   shot

EDIT:

import numpy as np

mergeddf = pd.merge(df2,df1, how="left")

result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]

You can merge them and keep only the lines that have a NaN.

df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]

or you can use isin :

df2[~df2.text.isin(df1.text)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM