简体   繁体   中英

Compare 2 Pandas Dataframes and return all rows that are different

I have 2 Dataframes with same schema and different data. I want to compare both of them and get all rows that have different values of any column.


id   Store         is_open
1   'Walmart'      true
2   'Best Buy'     false
3   'Target'       true
4   'Home Depot'   true


id   Store         is_open
1   'Walmart'      false
2   'Best Buy'     true
3   'Target'       true
4   'Home Depot'   false

I was able to get the difference but I don't get all the columns but just the ones that have been changed. So I get the following output:


id   is_open  is_open
1   true       false
2   false      true
4   true       false

Here is the code to achieve the above output:

ne_stacked = (from_aoi_df != to_aoi_df).stack() 
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col_changed']

difference_locations = np.where(from_aoi_df != to_aoi_df)
changed_from = from_aoi_df.values[difference_locations]
changed_to = to_aoi_df.values[difference_locations]
df5=pd.DataFrame({'from': changed_from, 'to': changed_to})

However, besides the above result, I also want all the same columns where Store column is also added, so my expected output is :

        id Store         is_open_df1  is_open_df2    
        1   Walmart       true        false 
        2   Best Buy      false       true        
        4   Home Depot    true        false 

How can I achieve that?

Using pandas merge function

df = pd.merge(df1,df2[['id','is_open']],on='id')


Filter out the rows which have unequal is_open columns

df = df[df["is_open_x"]!=df["is_open_y"]]


To rename the columns as your expectation



How about this?

df1['is_open_df2'] = df2['is_open']

expected_result_df = df1[df1['is_open'] != df1[is_open_df2']]

If the data frames are of different length. Here's something you can use.

new_df = pd.concat([df1, df2]).reset_index(drop=True)
df = new_df.drop_duplicates(subset=['col1','col2'], keep=False)

This will give you a data frame called df with just the records that were different.

  • where df1 and df2 are the two data frames you are trying to compare.
  • subset= list of columns you want to find duplicates for.
  • keep= false will drop duplicate value with its original.
  • keep=last will retain the record from the second data frame.
  • keep=first will retain the record from the first data frame.

If the dataframes are of the same length


Hope this helps!! Works if df1 and df2 have unique values...you can drop duplicates if any present in these before using this.


#compare DataFrames
m = (from_aoi_df != to_aoi_df)
#check at least one True per columns
m1 = m.any(axis=0)
#check at least one True per rows
m2 = m.any(axis=1)

#filter only not equal values
df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')

#filter equal values    
df3 = from_aoi_df.loc[m2, ~m1]

#join together
df = pd.concat([df3, df1, df2], axis=1)
print (df)
   id       Store  is_open_df1  is_open_df2
0   1     Walmart         True        False
1   2    Best Buy        False         True
3   4  Home Depot         True        False

Verify solution with multiple changed columns:

#changed first value id column
print (from_aoi_df)
   id       Store  is_open
0  10     Walmart     True
1   2    Best Buy    False
2   3      Target     True
3   4  Home Depot     True

m = (from_aoi_df != to_aoi_df)
m1 = m.any(axis=0)
m2 = m.any(axis=1)

df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')
df3 = from_aoi_df.loc[m2, ~m1]

df = pd.concat([df3, df1, df2], axis=1)
print (df)
        Store  id_df1  is_open_df1  id_df2  is_open_df2
0     Walmart      10         True       1        False
1    Best Buy       2        False       2         True
3  Home Depot       4         True       4        False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM