简体   繁体   English

如何比较Python pandas 和output 中的两个dataframes 的区别?

[英]How to compare two dataframes in Python pandas and output the difference?

I have two df with the same numbers of columns but different numbers of rows.我有两个具有相同列数但行数不同的 df。

df1 df1

   col1  col2
0     a    1,2,3,4
1     b    1,2,3
2     c    1

df2 df2

   col1  col2
0     b    1,3
1     c    1,2
2     d    1,2,3
3     e    1,2

df1 is the existing list, df2 is the updated list. df1 是现有列表,df2 是更新后的列表。 The expected result is whatever in df2 that was previously not in df1.预期结果是 df2 中以前不在 df1 中的任何结果。

Expected result:预期结果:

   col1  col2
0     c    2
1     d    1,2,3
2     e    1,2

I've tried with我试过

mask = df1['col2'] != df2['col2'] 

but it doesn't work with different rows of df.但它不适用于 df 的不同行。

Use DataFrame.explode by splitted values in columns col2 , then use DataFrame.merge with right join and indicato parameter, filter by boolean indexing only rows with right_only and last aggregate join :使用DataFrame.explode by 列col2中的拆分值,然后使用DataFrame.mergeright join和 indicato 参数,通过boolean indexing具有right_only和 last aggregate join的行:

df11 = df1.assign(col2 = df1['col2'].str.split(',')).explode('col2')
df22 = df2.assign(col2 = df2['col2'].str.split(',')).explode('col2')

df = df11.merge(df22, indicator=True, how='right', on=['col1','col2'])

df = (df[df['_merge'].eq('right_only')]
              .groupby('col1')['col2']
              .agg(','.join)
              .reset_index(name='col2'))
print (df)
  col1   col2
0    c      2
1    d  1,2,3
2    e    1,2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM