按多列比较熊猫数据框

Question

What is the best way to figure out how two dataframes differ based on a combination of multiple columns. 找出两个数据框基于多列组合的不同之处的最佳方法是什么。 So if I have the following: 所以，如果我有以下内容：

df1: DF1：

  A B C
0 1 2 3
1 3 4 2

df2: DF2：

  A B C
0 1 2 3
1 3 5 2

Want to show all rows where there is a difference such as (3,4,2) vs. (3,5,2) from above example. 想要显示上面示例中存在（3,4,2）与（3,5,2）之类的差异的所有行。 I've tried using the pd.merge() thinking that if I use all columns as the key to join using outer join, I would end up with dataframe that would help me get what I want but it doesn't turn out that way. 我尝试使用pd.merge（）来思考，如果我将所有列都用作使用外部联接进行联接的键，则最终会得到可以帮助我获得所需内容的数据框，但事实并非如此。

Thanks to EdChum I was able to use a mask from a boolean diff as below but first had to make sure indexes were comparable. 多亏了EdChum，我可以使用如下所示的布尔差异掩码，但首先必须确保索引具有可比性。

df1 = df1.set_index('A')
df2 = df2.set_index('A') #this gave me a nice index using one of the keys.
                  #if there are different rows than I would get nulls. 
df1 = df1.reindex_like(df2)
df1[~(df1==df2).all(axis=1)] #this gave me all rows that differed.

Answer 1

We can use .all and pass axis=1 to perform row comparisons, we can then use this boolean index to show the rows that differ by negating ~ the boolean index: 我们可以使用.all并传递axis=1来执行行比较，然后可以使用此布尔索引通过取反~布尔索引来显示不同的行：

In [43]:

df[~(df==df1).all(axis=1)]
Out[43]:
   A  B  C
1  3  4  2

breaking this down: 分解：

In [44]:

df==df1
Out[44]:
      A      B     C
0  True   True  True
1  True  False  True
In [45]:

(df==df1).all(axis=1)
Out[45]:
0     True
1    False
dtype: bool

We can then pass the above as a boolean index to df and invert it using ~ 然后，我们可以将上述内容作为布尔索引传递给df ，并使用~对其进行反转

按多列比较熊猫数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-16 16:31:15

按多列比较熊猫数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-16 16:31:15

解决方案1
1 已采纳 2015-03-16 16:31:15