简体   繁体   中英

Finding the difference between two dataframes in Python

Suppose I have two dataframes

A :

column1 column2 
  abc      2
  def      2

B :

column1 column2 
  abc      2
  def      1

I want to compare these two dataframes and find where there are differences and get the value of column1.

So the output should be 'def' in this case

Based on this answer here, you can try pd.concat method:

pd.concat([A,B]).drop_duplicates(keep=False)['column1'].unique().tolist()

Output:

# if you just want to see the differences between the dataframe
>>> pd.concat([A,B]).drop_duplicates(keep=False)
  column1  column2
1     def        2
1     def        1
# if you just want to see the differences and with only 'column1'
>>> pd.concat([A,B]).drop_duplicates(keep=False)['column1']
1    def
1    def
Name: column1, dtype: object
# if you want unique values in the column1 as a numpy array after taking the differences
>>> pd.concat([A,B]).drop_duplicates(keep=False)['column1'].unique()
array(['def'], dtype=object) 
# if you want unique values in the column1 as a list after taking the differences
>>> pd.concat([A,B]).drop_duplicates(keep=False)['column1'].unique().tolist() 
['def']
pd.concat([A,B]).drop_duplicates(keep=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM