[英]Compare two pandas dataframes for differences
I have two dataframes and I want to compare them, then display the differences side by side. 我有两个数据框,我想比较它们,然后并排显示差异。 I had been using the accepted solution from this question , but am now getting an error with
ne_stacked = (current_df != new_df).stack()
. 我一直在使用这个问题接受的解决方案,但现在
ne_stacked = (current_df != new_df).stack()
。
This used to work fine, but the error I'm getting now is The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
这曾经可以正常工作,但是我现在得到的错误是
The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. 。 After looking at the documentation for all of these options I'm not sure how to implement any of them and keep the same functionality in my code.
在查看了所有这些选项的文档后,我不确定如何实现它们中的任何一个并在我的代码中保留相同的功能。
How would I go about replacing ne_stacked = (current_df != new_df).stack()
so I don't get the ambiguity error? 我将如何替换
ne_stacked = (current_df != new_df).stack()
这样就不会出现歧义性错误?
EDIT 编辑
Basic code example as requested: 要求的基本代码示例:
d = {'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]}
d2 = {'a':[4,2,3],'b':[1,4,3],'c':[1,2,4]}
df1 = pd.DataFrame(d)
df2 = pd.DataFrame(d2)
print (df1 != df2) //returns true when value in df1 is not equal to df2
a b c
0 True False False
1 False True False
2 False False True
So the !=
expression works just fine for this simple dataframe, but not the more complex ones I'm using (below). 因此
!=
表达式仅适用于此简单数据框,但不适用于我正在使用的更复杂的数据框(如下)。
df1 = {'CORE': [{'satellite': '2B',
'windowEnd': '2015-218 04:00:00',
'windowStart': '2015-217 20:00:00'}],
'DURATION': [500.0],
'PRIORITY': [5],
'RATE': [u'HIGH_RATE'],
'STATUS': [u'ACTIVE'],
'TASK_ID': [1],
'TYPE': [u'NOMINAL'],
'WINDOW_END': ['2015-218 04:00:00'],
'WINDOW_START': ['2015-217 20:00:00']}
df2 = {'CORE': [{'satellite': '2B',
'windowEnd': '2015-220 04:00:00',
'windowStart': '2015-219 20:00:00'}],
'DURATION': [500.0],
'PRIORITY': [5],
'RATE': [u'HIGH_RATE'],
'STATUS': [u'ACTIVE'],
'TASK_ID': [2],
'TYPE': [u'NOMINAL'],
'WINDOW_END': ['2015-220 04:00:00'],
'WINDOW_START': ['2015-219 20:00:00']}
I'm using pandas version '0.16.2'
and I couldn't see any error when I tried to evaluate df1 != df2
. 我使用的是熊猫版本
'0.16.2'
,尝试评估df1 != df2
时看不到任何错误。
Take a look at my code below: 看下面我的代码:
import pandas as pd
d1 = {'CORE': [{'satellite': '2B',
'windowEnd': '2015-218 04:00:00',
'windowStart': '2015-217 20:00:00'}],
'DURATION': [500.0],
'PRIORITY': [5],
'RATE': [u'HIGH_RATE'],
'STATUS': [u'ACTIVE'],
'TASK_ID': [1],
'TYPE': [u'NOMINAL'],
'WINDOW_END': ['2015-218 04:00:00'],
'WINDOW_START': ['2015-217 20:00:00']}
d2 = {'CORE': [{'satellite': '2B',
'windowEnd': '2015-220 04:00:00',
'windowStart': '2015-219 20:00:00'}],
'DURATION': [500.0],
'PRIORITY': [5],
'RATE': [u'HIGH_RATE'],
'STATUS': [u'ACTIVE'],
'TASK_ID': [2],
'TYPE': [u'NOMINAL'],
'WINDOW_END': ['2015-220 04:00:00'],
'WINDOW_START': ['2015-219 20:00:00']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
print (df1 != df2)
# It was printed:
# CORE DURATION PRIORITY RATE STATUS TASK_ID TYPE WINDOW_END WINDOW_START
# 0 True False False False False True False True True
You could also try to use .any()
: 您也可以尝试使用
.any()
:
print (df1 != df2).any(axis=0)
# It was printed:
# CORE True
# DURATION False
# PRIORITY False
# RATE False
# STATUS False
# TASK_ID True
# TYPE False
# WINDOW_END True
# WINDOW_START True
# dtype: bool
Take care with .any()
, because it will look for any True
values in the entire row/column. 请注意
.any()
,因为它将在整个行/列中查找任何True
值。 I don't know if you need that. 不知道你是否需要
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.