比较两个具有不同整数类型的 Pandas 数据帧

Question

I just ran into some weird behaviour comparing the values of two pandas dataframes using pd.Dataframe.equals() :我刚刚使用pd.Dataframe.equals()比较两个熊猫数据帧的值时遇到了一些奇怪的行为：

Comparison 1比较一

df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()

df1.equals(df2)
# True (obviously)

However, when I change the column type to a different integer format, they will not be considered equal anymore:但是，当我将列类型更改为不同的整数格式时，它们将不再被视为相等：

df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False

In the .equals() documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work.在.equals()文档中，他们指出变量必须具有相同的类型，并提供了一个将浮点数与整数进行比较的示例，但这是行不通的。 I didn't expect this to extend to different types of integers, too.我没想到这也会扩展到不同类型的整数。

Comparison 2比较二

When doing the same comparison using == , it does return True :使用==进行相同的比较时，它确实返回True ：

(df1 == df2).all().all()   
# True

However, == doesn't assess two missing values as equal to each other.但是， ==不会将两个缺失值评估为彼此相等。

My question我的问题

Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type?有没有一种优雅的方法来处理缺失值相等，同时不强制执行相同的整数类型？ The best I can come up with is:我能想到的最好的是：

(df1.fillna(0) == df2.fillna(0)).all().all()

but there has to be a more concise and less hacky way to deal with this problem.但是必须有一种更简洁、更简洁的方法来处理这个问题。

My follow up, opinion-based question: Would you consider this a bug?我的后续基于意见的问题：您认为这是一个错误吗？

Answer 1

If you think of this as a decimal problem (ie does 2 equal 2) then this perhaps looks like a bug.如果您认为这是一个小数问题（即 2 是否等于 2），那么这可能看起来像是一个错误。 However, if you look at it from how the interpreter sees it (ie does 00000010 equal 0000000000000010) then it becomes plain that there is indeed a difference.但是，如果您从解释器的角度来看它（即 00000010 是否等于 0000000000000010），那么很明显确实存在差异。 Bitwise operations.按位运算。

From a validation perspective, it is probably a good idea to make sure you are comparing apples to apples and so I like the answer of @Ben.T:从验证的角度来看，确保将苹果与苹果进行比较可能是一个好主意，因此我喜欢@Ben.T 的答案：

df1.equals(df2.astype(df1.dtypes))

Is this a bug?这是一个错误吗？ That is above my pay grade.这高于我的工资等级。 You can submit it, and the thinkers surrounding the pandas library can make a decision.你可以提交，pandas 库周围的思考者可以做出决定。 It does seem odd that the '==' operator gives different results that the '.equals' function and that may sway the decision. '==' 运算符给出与 '.equals' 函数不同的结果，这似乎很奇怪，这可能会影响决定。

比较两个具有不同整数类型的 Pandas 数据帧

问题描述

Comparison 1比较一

Comparison 2比较二

My question我的问题

1 个解决方案

解决方案1
1 2021-01-26 17:42:23

比较两个具有不同整数类型的 Pandas 数据帧

问题描述

Comparison 1比较一

Comparison 2比较二

My question我的问题

1 个解决方案

解决方案1 1 2021-01-26 17:42:23

解决方案1
1 2021-01-26 17:42:23