如何比较Python pandas 和output 中的两个dataframes 的区别？

Question

I have two df with the same numbers of columns but different numbers of rows.我有两个具有相同列数但行数不同的 df。

df1 df1

   col1  col2
0     a    1,2,3,4
1     b    1,2,3
2     c    1

df2 df2

   col1  col2
0     b    1,3
1     c    1,2
2     d    1,2,3
3     e    1,2

df1 is the existing list, df2 is the updated list. df1 是现有列表，df2 是更新后的列表。 The expected result is whatever in df2 that was previously not in df1.预期结果是 df2 中以前不在 df1 中的任何结果。

Expected result:预期结果：

   col1  col2
0     c    2
1     d    1,2,3
2     e    1,2

I've tried with我试过

mask = df1['col2'] != df2['col2']

but it doesn't work with different rows of df.但它不适用于 df 的不同行。

Answer 1

Use DataFrame.explode by splitted values in columns col2 , then use DataFrame.merge with right join and indicato parameter, filter by boolean indexing only rows with right_only and last aggregate join :使用DataFrame.explode by 列col2中的拆分值，然后使用DataFrame.merge和right join和 indicato 参数，通过boolean indexing具有right_only和 last aggregate join的行：

df11 = df1.assign(col2 = df1['col2'].str.split(',')).explode('col2')
df22 = df2.assign(col2 = df2['col2'].str.split(',')).explode('col2')

df = df11.merge(df22, indicator=True, how='right', on=['col1','col2'])

df = (df[df['_merge'].eq('right_only')]
              .groupby('col1')['col2']
              .agg(','.join)
              .reset_index(name='col2'))
print (df)
  col1   col2
0    c      2
1    d  1,2,3
2    e    1,2

如何比较Python pandas 和output 中的两个dataframes 的区别？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-12 10:49:35

如何比较Python pandas 和output 中的两个dataframes 的区别？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-12 10:49:35

解决方案1
1 已采纳 2021-04-12 10:49:35