[英]find difference between rows of 2 pandas dataframe
I have 2 pandas dataframes which have exactly same columns. 我有2个熊猫数据框,它们具有完全相同的列。 So they look something like this: 所以他们看起来像这样:
Dataframe1:
C1 C2 C3
1 A X
2 B Y
Dataframe2:
C1 C2 C3
1 A X
3 C Z
I want to find difference between these 2 dataframes. 我想找到这两个数据框之间的区别。 Basically i need following 3 output: 基本上我需要以下3个输出:
Rows present in dataframe1, but missing in dataframe2 dataframe1中存在行,但dataframe2中缺少行
2 BY
Rows present in dataframe2, but missing in dataframe1 dataframe2中存在行,但dataframe1中缺少行
3 CZ
I found no of same rows as: 我发现没有与以下相同的行:
same_line_count = (pd.merge(df1, df2, on=['C1', 'C2', 'C3'], how='inner')).shape[0]
But I am unable to find other 2 nos. 但是我找不到其他2个数字。
I think you need merge
with outer join and parameter indicator
, for filtering use loc
with boolean indexing
and for count same rows sum
boolean mask: 我认为您需要与外部连接和参数indicator
merge
,以使用boolean indexing
进行loc
过滤并计数相同的行sum
布尔掩码:
print (Dataframe1)
C1 C2 C3
0 1 A X
1 2 B Y
2 2 C Y
print (Dataframe2)
C1 C2 C3
0 1 A X
1 3 C Z
df = pd.merge(Dataframe1, Dataframe2, indicator=True, how='outer')
print (df)
C1 C2 C3 _merge
0 1 A X both
1 2 B Y left_only
2 2 C Y left_only
3 3 C Z right_only
both = (df['_merge'] == 'both').sum()
print (both)
1
left_only = df.loc[df['_merge'] == 'left_only', Dataframe1.columns]
print (left_only)
C1 C2 C3
1 2 B Y
2 2 C Y
right_only = df.loc[df['_merge'] == 'right_only', Dataframe1.columns]
print (right_only)
C1 C2 C3
3 3 C Z
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.