简体   繁体   中英

Pandas remove all elements from one dataframe from another

I want to find the difference between two dataframes (elements in df1, not in df2) based on a subset of columns. The two data frames have the same schema.

Say df1 contains

col1 col2 col3 col4
A    B    C    D
A    C    D    D

and df2 contains

col1 col2 col3 col4
A    D    D    D
A    B    D    D

and I wanted the items in df1, where there isn't an item in df2 where col1 and col2 match. So in this case the expected output would be just the 2nd row of df1.

A    C    D    D

I've tried different variations of isin , but I'm struggling to find anything that works. I tried https://stackoverflow.com/a/16704977/1639228 , but that only works for single columns.

The problem with using isin is that the index also has to match if you use a DataFrame. I dont know what your index is, but if its different where col1 and col2 are equal, it will stil return a negative result.

Converting your second DataFrame to a list will make it work (since that removes the index). The isin matches for both columns separately but with all(axis-1) you filter this down to the case where both match.

sub = ['col1', 'col2']
mask = df1[sub].isin(df2[sub].to_dict(outtype='list')).all(axis=1)

df1[~mask]

  col1 col2 col3 col4
1    A    C    D    D

I know this is a very old question. But this comes on top on google if I search this problem. If there a column in both the dataframe where the values are unique it can be done like this

  uniq__value_list = df1[col1].tolist()
  df3 = df2[~df.col1.isin(uniq__value_list)]

Now, the third dataframe will have values that are in df1 but not df2 .

I don't know if this is efficient, but I found a way to do it after hours of experimenting. It involves first re-indexing the dataframes to use the columns you care about as the index.

df1.set_index(['col1', 'col2'], inplace=True)
df2.set_index(['col1', 'col2'], inplace=True)

df1[df1.index.map(lambda x: x not in df2.index)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM