I have two separate pandas dataframes ( df1
and df2
) which have multiple columns, but only one in common ('text').
I would like to do find every row in df2
that does not have a match in any of the rows of the column that df2
and df1
have in common.
df1
A B text
45 2 score
33 5 miss
20 1 score
df2
C D text
.5 2 shot
.3 2 shot
.3 1 miss
Result df (remove row containing miss since it occurs in df1)
C D text
.5 2 shot
.3 2 shot
Is it possible to use the isin
method in this scenario?
As you asked, you can do this efficiently using isin
(without resorting to expensive merge
s).
>>> df2[~df2.text.isin(df1.text.values)]
C D text
0 0.5 2 shot
1 0.3 2 shot
EDIT:
import numpy as np
mergeddf = pd.merge(df2,df1, how="left")
result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]
You can merge them and keep only the lines that have a NaN.
df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]
or you can use isin
:
df2[~df2.text.isin(df1.text)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.