简体   繁体   中英

Find the difference (set difference) between two dataframes in python

I have two dataframes: df1 and df2. I want to eliminate all occurrences of df2 rows in df1. Basically, this is the set difference operator but for dataframes.

My ask is very similar to this question with one major variation that its possible that df1 may have no common rows at all. In that case, if we concat the two dataframes and then drop the duplicates, it still doesn't eliminate df2 occurrences in df1. Infact it adds to it.

The question is also similar to this , except that I want my operation on the rows.

Example:

Case 1:
df1:
A,B,C,D
E,F,G,H

df2:
E,F,G,H

Then, df1-df2:
A,B,C,D

Case 2:
df1:
A,B,C,D

df2:
E,F,G,H

Then, df1 - df2:
A,B,C,D

Spoken simply, I am looking for a way to do df1 - df2 (remove df2 if present in df1). How should this be done?

try:

df1[~df1.isin(df2)]

A,B,C,D

Set difference will work here, it returns unique values in ar1 that are not in ar2.

np.setdiff1d(df1, df2)

Or to get the result in form of DataFrame,

pd.DataFrame([np.setdiff1d(df1, df2)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM