简体   繁体   English

比较两个熊猫数据框的差异

[英]Compare two pandas dataframes for differences

I have two dataframes and I want to compare them, then display the differences side by side. 我有两个数据框,我想比较它们,然后并排显示差异。 I had been using the accepted solution from this question , but am now getting an error with ne_stacked = (current_df != new_df).stack() . 我一直在使用这个问题接受的解决方案,但现在ne_stacked = (current_df != new_df).stack()

This used to work fine, but the error I'm getting now is The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 这曾经可以正常工作,但是我现在得到的错误是The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). . After looking at the documentation for all of these options I'm not sure how to implement any of them and keep the same functionality in my code. 在查看了所有这些选项的文档后,我不确定如何实现它们中的任何一个并在我的代码中保留相同的功能。

How would I go about replacing ne_stacked = (current_df != new_df).stack() so I don't get the ambiguity error? 我将如何替换ne_stacked = (current_df != new_df).stack()这样就不会出现歧义性错误?

EDIT 编辑

Basic code example as requested: 要求的基本代码示例:

d = {'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]}
d2 = {'a':[4,2,3],'b':[1,4,3],'c':[1,2,4]}
df1 = pd.DataFrame(d)
df2 = pd.DataFrame(d2)
print (df1 != df2) //returns true when value in df1 is not equal to df2

       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True

So the != expression works just fine for this simple dataframe, but not the more complex ones I'm using (below). 因此!=表达式仅适用于此简单数据框,但不适用于我正在使用的更复杂的数据框(如下)。

df1 = {'CORE': [{'satellite': '2B',
   'windowEnd': '2015-218 04:00:00',
   'windowStart': '2015-217 20:00:00'}],
 'DURATION': [500.0],
 'PRIORITY': [5],
 'RATE': [u'HIGH_RATE'],
 'STATUS': [u'ACTIVE'],
 'TASK_ID': [1],
 'TYPE': [u'NOMINAL'],
 'WINDOW_END': ['2015-218 04:00:00'],
 'WINDOW_START': ['2015-217 20:00:00']}

df2 = {'CORE': [{'satellite': '2B',
   'windowEnd': '2015-220 04:00:00',
   'windowStart': '2015-219 20:00:00'}],
 'DURATION': [500.0],
 'PRIORITY': [5],
 'RATE': [u'HIGH_RATE'],
 'STATUS': [u'ACTIVE'],
 'TASK_ID': [2],
 'TYPE': [u'NOMINAL'],
 'WINDOW_END': ['2015-220 04:00:00'],
 'WINDOW_START': ['2015-219 20:00:00']}

I'm using pandas version '0.16.2' and I couldn't see any error when I tried to evaluate df1 != df2 . 我使用的是熊猫版本'0.16.2' ,尝试评估df1 != df2时看不到任何错误。

Take a look at my code below: 看下面我的代码:

import pandas as pd

d1 = {'CORE': [{'satellite': '2B',
  'windowEnd': '2015-218 04:00:00',
  'windowStart': '2015-217 20:00:00'}],
  'DURATION': [500.0],
  'PRIORITY': [5],
  'RATE': [u'HIGH_RATE'],
  'STATUS': [u'ACTIVE'],
  'TASK_ID': [1],
  'TYPE': [u'NOMINAL'],
  'WINDOW_END': ['2015-218 04:00:00'],
  'WINDOW_START': ['2015-217 20:00:00']}

d2 = {'CORE': [{'satellite': '2B',
  'windowEnd': '2015-220 04:00:00',
  'windowStart': '2015-219 20:00:00'}],
  'DURATION': [500.0],
  'PRIORITY': [5],
  'RATE': [u'HIGH_RATE'],
  'STATUS': [u'ACTIVE'],
  'TASK_ID': [2],
  'TYPE': [u'NOMINAL'],
  'WINDOW_END': ['2015-220 04:00:00'],
  'WINDOW_START': ['2015-219 20:00:00']}

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
print (df1 != df2)

# It was printed:
#    CORE   DURATION  PRIORITY   RATE   STATUS  TASK_ID  TYPE    WINDOW_END WINDOW_START
# 0  True   False     False      False  False   True     False   True       True

You could also try to use .any() : 您也可以尝试使用.any()

print (df1 != df2).any(axis=0)
# It was printed:
# CORE             True
# DURATION        False
# PRIORITY        False
# RATE            False
# STATUS          False
# TASK_ID          True
# TYPE            False
# WINDOW_END       True
# WINDOW_START     True
# dtype: bool

Take care with .any() , because it will look for any True values in the entire row/column. 请注意.any() ,因为它将在整个行/列中查找任何True值。 I don't know if you need that. 不知道你是否需要

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM