[英]Comparing two pandas dataframes with different integer types
I just ran into some weird behaviour comparing the values of two pandas dataframes using pd.Dataframe.equals()
:我刚刚使用
pd.Dataframe.equals()
比较两个熊猫数据帧的值时遇到了一些奇怪的行为:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()
df1.equals(df2)
# True (obviously)
However, when I change the column type to a different integer format, they will not be considered equal anymore:但是,当我将列类型更改为不同的整数格式时,它们将不再被视为相等:
df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False
In the .equals()
documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work.在
.equals()
文档中,他们指出变量必须具有相同的类型,并提供了一个将浮点数与整数进行比较的示例,但这是行不通的。 I didn't expect this to extend to different types of integers, too.我没想到这也会扩展到不同类型的整数。
When doing the same comparison using ==
, it does return True
:使用
==
进行相同的比较时,它确实返回True
:
(df1 == df2).all().all()
# True
However, ==
doesn't assess two missing values as equal to each other.但是,
==
不会将两个缺失值评估为彼此相等。
Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type?有没有一种优雅的方法来处理缺失值相等,同时不强制执行相同的整数类型? The best I can come up with is:
我能想到的最好的是:
(df1.fillna(0) == df2.fillna(0)).all().all()
but there has to be a more concise and less hacky way to deal with this problem.但是必须有一种更简洁、更简洁的方法来处理这个问题。
My follow up, opinion-based question: Would you consider this a bug?我的后续基于意见的问题:您认为这是一个错误吗?
If you think of this as a decimal problem (ie does 2 equal 2) then this perhaps looks like a bug.如果您认为这是一个小数问题(即 2 是否等于 2),那么这可能看起来像是一个错误。 However, if you look at it from how the interpreter sees it (ie does 00000010 equal 0000000000000010) then it becomes plain that there is indeed a difference.
但是,如果您从解释器的角度来看它(即 00000010 是否等于 0000000000000010),那么很明显确实存在差异。 Bitwise operations.
按位运算。
From a validation perspective, it is probably a good idea to make sure you are comparing apples to apples and so I like the answer of @Ben.T:从验证的角度来看,确保将苹果与苹果进行比较可能是一个好主意,因此我喜欢@Ben.T 的答案:
df1.equals(df2.astype(df1.dtypes))
Is this a bug?这是一个错误吗? That is above my pay grade.
这高于我的工资等级。 You can submit it, and the thinkers surrounding the pandas library can make a decision.
你可以提交,pandas 库周围的思考者可以做出决定。 It does seem odd that the '==' operator gives different results that the '.equals' function and that may sway the decision.
'==' 运算符给出与 '.equals' 函数不同的结果,这似乎很奇怪,这可能会影响决定。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.