[英]How to compare two dataframes in Python pandas and output the difference?
I have two df with the same numbers of columns but different numbers of rows.我有两个具有相同列数但行数不同的 df。
df1 df1
col1 col2
0 a 1,2,3,4
1 b 1,2,3
2 c 1
df2 df2
col1 col2
0 b 1,3
1 c 1,2
2 d 1,2,3
3 e 1,2
df1 is the existing list, df2 is the updated list. df1 是现有列表,df2 是更新后的列表。 The expected result is whatever in df2 that was previously not in df1.
预期结果是 df2 中以前不在 df1 中的任何结果。
Expected result:预期结果:
col1 col2
0 c 2
1 d 1,2,3
2 e 1,2
I've tried with我试过
mask = df1['col2'] != df2['col2']
but it doesn't work with different rows of df.但它不适用于 df 的不同行。
Use DataFrame.explode
by splitted values in columns col2
, then use DataFrame.merge
with right join
and indicato parameter, filter by boolean indexing
only rows with right_only
and last aggregate join
:使用
DataFrame.explode
by 列col2
中的拆分值,然后使用DataFrame.merge
和right join
和 indicato 参数,通过boolean indexing
具有right_only
和 last aggregate join
的行:
df11 = df1.assign(col2 = df1['col2'].str.split(',')).explode('col2')
df22 = df2.assign(col2 = df2['col2'].str.split(',')).explode('col2')
df = df11.merge(df22, indicator=True, how='right', on=['col1','col2'])
df = (df[df['_merge'].eq('right_only')]
.groupby('col1')['col2']
.agg(','.join)
.reset_index(name='col2'))
print (df)
col1 col2
0 c 2
1 d 1,2,3
2 e 1,2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.