简体   繁体   中英

Compare 2 Pandas Dataframes and return all rows that are different

I have 2 Dataframes with same schema and different data. I want to compare both of them and get all rows that have different values of any column.

"df1":

id   Store         is_open
1   'Walmart'      true
2   'Best Buy'     false
3   'Target'       true
4   'Home Depot'   true

"df2":

id   Store         is_open
1   'Walmart'      false
2   'Best Buy'     true
3   'Target'       true
4   'Home Depot'   false

I was able to get the difference but I don't get all the columns but just the ones that have been changed. So I get the following output:

result_df:

id   is_open  is_open
1   true       false
2   false      true
4   true       false

Here is the code to achieve the above output:

ne_stacked = (from_aoi_df != to_aoi_df).stack() 
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col_changed']

difference_locations = np.where(from_aoi_df != to_aoi_df)
changed_from = from_aoi_df.values[difference_locations]
changed_to = to_aoi_df.values[difference_locations]
df5=pd.DataFrame({'from': changed_from, 'to': changed_to})
df5

However, besides the above result, I also want all the same columns where Store column is also added, so my expected output is :

expected_result_df:
        id Store         is_open_df1  is_open_df2    
        1   Walmart       true        false 
        2   Best Buy      false       true        
        4   Home Depot    true        false 

How can I achieve that?

Using pandas merge function

df = pd.merge(df1,df2[['id','is_open']],on='id')

在此处输入图片说明

Filter out the rows which have unequal is_open columns

df = df[df["is_open_x"]!=df["is_open_y"]]
df

在此处输入图片说明

To rename the columns as your expectation

df.rename(columns={"is_open_x":"is_open_df1","is_open_y":"is_open_df2"})

在此处输入图片说明

How about this?

df1['is_open_df2'] = df2['is_open']

expected_result_df = df1[df1['is_open'] != df1[is_open_df2']]

If the data frames are of different length. Here's something you can use.

new_df = pd.concat([df1, df2]).reset_index(drop=True)
df = new_df.drop_duplicates(subset=['col1','col2'], keep=False)

This will give you a data frame called df with just the records that were different.

  • where df1 and df2 are the two data frames you are trying to compare.
  • subset= list of columns you want to find duplicates for.
  • keep= false will drop duplicate value with its original.
  • keep=last will retain the record from the second data frame.
  • keep=first will retain the record from the first data frame.

If the dataframes are of the same length

df=np.where(df1==df2,'true','false')

Hope this helps!! Works if df1 and df2 have unique values...you can drop duplicates if any present in these before using this.

Use:

#compare DataFrames
m = (from_aoi_df != to_aoi_df)
#check at least one True per columns
m1 = m.any(axis=0)
#check at least one True per rows
m2 = m.any(axis=1)

#filter only not equal values
df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')

#filter equal values    
df3 = from_aoi_df.loc[m2, ~m1]

#join together
df = pd.concat([df3, df1, df2], axis=1)
print (df)
   id       Store  is_open_df1  is_open_df2
0   1     Walmart         True        False
1   2    Best Buy        False         True
3   4  Home Depot         True        False

Verify solution with multiple changed columns:

#changed first value id column
print (from_aoi_df)
   id       Store  is_open
0  10     Walmart     True
1   2    Best Buy    False
2   3      Target     True
3   4  Home Depot     True

m = (from_aoi_df != to_aoi_df)
m1 = m.any(axis=0)
m2 = m.any(axis=1)

df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')
df3 = from_aoi_df.loc[m2, ~m1]

df = pd.concat([df3, df1, df2], axis=1)
print (df)
        Store  id_df1  is_open_df1  id_df2  is_open_df2
0     Walmart      10         True       1        False
1    Best Buy       2        False       2         True
3  Home Depot       4         True       4        False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM