Compare 2 Pandas Dataframes and return all rows that are different

Question

I have 2 Dataframes with same schema and different data. I want to compare both of them and get all rows that have different values of any column.

"df1":

id   Store         is_open
1   'Walmart'      true
2   'Best Buy'     false
3   'Target'       true
4   'Home Depot'   true

"df2":

id   Store         is_open
1   'Walmart'      false
2   'Best Buy'     true
3   'Target'       true
4   'Home Depot'   false

I was able to get the difference but I don't get all the columns but just the ones that have been changed. So I get the following output:

result_df:

id   is_open  is_open
1   true       false
2   false      true
4   true       false

Here is the code to achieve the above output:

ne_stacked = (from_aoi_df != to_aoi_df).stack() 
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col_changed']

difference_locations = np.where(from_aoi_df != to_aoi_df)
changed_from = from_aoi_df.values[difference_locations]
changed_to = to_aoi_df.values[difference_locations]
df5=pd.DataFrame({'from': changed_from, 'to': changed_to})
df5

However, besides the above result, I also want all the same columns where Store column is also added, so my expected output is :

expected_result_df:
        id Store         is_open_df1  is_open_df2    
        1   Walmart       true        false 
        2   Best Buy      false       true        
        4   Home Depot    true        false

How can I achieve that?

Answer 1

Using pandas merge function

df = pd.merge(df1,df2[['id','is_open']],on='id')

Filter out the rows which have unequal is_open columns

df = df[df["is_open_x"]!=df["is_open_y"]]
df

To rename the columns as your expectation

df.rename(columns={"is_open_x":"is_open_df1","is_open_y":"is_open_df2"})

Answer 2

How about this?

df1['is_open_df2'] = df2['is_open']

expected_result_df = df1[df1['is_open'] != df1[is_open_df2']]

Answer 3

If the data frames are of different length. Here's something you can use.

new_df = pd.concat([df1, df2]).reset_index(drop=True)
df = new_df.drop_duplicates(subset=['col1','col2'], keep=False)

This will give you a data frame called df with just the records that were different.

where df1 and df2 are the two data frames you are trying to compare.
subset= list of columns you want to find duplicates for.
keep= false will drop duplicate value with its original.
keep=last will retain the record from the second data frame.
keep=first will retain the record from the first data frame.

If the dataframes are of the same length

df=np.where(df1==df2,'true','false')

Hope this helps!! Works if df1 and df2 have unique values...you can drop duplicates if any present in these before using this.

Answer 4

Use:

#compare DataFrames
m = (from_aoi_df != to_aoi_df)
#check at least one True per columns
m1 = m.any(axis=0)
#check at least one True per rows
m2 = m.any(axis=1)

#filter only not equal values
df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')

#filter equal values    
df3 = from_aoi_df.loc[m2, ~m1]

#join together
df = pd.concat([df3, df1, df2], axis=1)
print (df)
   id       Store  is_open_df1  is_open_df2
0   1     Walmart         True        False
1   2    Best Buy        False         True
3   4  Home Depot         True        False

Verify solution with multiple changed columns:

#changed first value id column
print (from_aoi_df)
   id       Store  is_open
0  10     Walmart     True
1   2    Best Buy    False
2   3      Target     True
3   4  Home Depot     True

m = (from_aoi_df != to_aoi_df)
m1 = m.any(axis=0)
m2 = m.any(axis=1)

df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')
df3 = from_aoi_df.loc[m2, ~m1]

df = pd.concat([df3, df1, df2], axis=1)
print (df)
        Store  id_df1  is_open_df1  id_df2  is_open_df2
0     Walmart      10         True       1        False
1    Best Buy       2        False       2         True
3  Home Depot       4         True       4        False

Compare 2 Pandas Dataframes and return all rows that are different

Question

4 answers

solution1
2 2019-02-08 10:30:55

solution2
1 ACCPTED 2019-02-08 07:39:33

solution3
1 2020-06-26 01:47:56

If the data frames are of different length. Here's something you can use.

If the dataframes are of the same length

solution4
0 2019-02-08 07:44:03

Compare 2 Pandas Dataframes and return all rows that are different

Question

4 answers

solution1 2 2019-02-08 10:30:55

solution2 1 ACCPTED 2019-02-08 07:39:33

solution3 1 2020-06-26 01:47:56

If the data frames are of different length. Here's something you can use.

If the dataframes are of the same length

solution4 0 2019-02-08 07:44:03

solution1
2 2019-02-08 10:30:55

solution2
1 ACCPTED 2019-02-08 07:39:33

solution3
1 2020-06-26 01:47:56

solution4
0 2019-02-08 07:44:03