How to compare two dataframes in Python pandas and output the difference?

Question

I have two df with the same numbers of columns but different numbers of rows.

df1

   col1  col2
0     a    1,2,3,4
1     b    1,2,3
2     c    1

df2

   col1  col2
0     b    1,3
1     c    1,2
2     d    1,2,3
3     e    1,2

df1 is the existing list, df2 is the updated list. The expected result is whatever in df2 that was previously not in df1.

Expected result:

   col1  col2
0     c    2
1     d    1,2,3
2     e    1,2

I've tried with

mask = df1['col2'] != df2['col2']

but it doesn't work with different rows of df.

Answer 1

Use DataFrame.explode by splitted values in columns col2 , then use DataFrame.merge with right join and indicato parameter, filter by boolean indexing only rows with right_only and last aggregate join :

df11 = df1.assign(col2 = df1['col2'].str.split(',')).explode('col2')
df22 = df2.assign(col2 = df2['col2'].str.split(',')).explode('col2')

df = df11.merge(df22, indicator=True, how='right', on=['col1','col2'])

df = (df[df['_merge'].eq('right_only')]
              .groupby('col1')['col2']
              .agg(','.join)
              .reset_index(name='col2'))
print (df)
  col1   col2
0    c      2
1    d  1,2,3
2    e    1,2

How to compare two dataframes in Python pandas and output the difference?

Question

1 answers

solution1
1 ACCPTED 2021-04-12 10:49:35

How to compare two dataframes in Python pandas and output the difference?

Question

1 answers

solution1 1 ACCPTED 2021-04-12 10:49:35

solution1
1 ACCPTED 2021-04-12 10:49:35