简体   繁体   English

比较两个具有不同整数类型的 Pandas 数据帧

[英]Comparing two pandas dataframes with different integer types

I just ran into some weird behaviour comparing the values of two pandas dataframes using pd.Dataframe.equals() :我刚刚使用pd.Dataframe.equals()比较两个熊猫数据帧的值时遇到了一些奇怪的行为:

Comparison 1比较一

df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()

df1.equals(df2)
# True (obviously)

However, when I change the column type to a different integer format, they will not be considered equal anymore:但是,当我将列类型更改为不同的整数格式时,它们将不再被视为相等:

df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False

In the .equals() documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work..equals()文档中,他们指出变量必须具有相同的类型,并提供了一个将浮点数与整数进行比较的示例,但这是行不通的。 I didn't expect this to extend to different types of integers, too.我没想到这也会扩展到不同类型的整数。

Comparison 2比较二

When doing the same comparison using == , it does return True :使用==进行相同的比较时,它确实返回True

(df1 == df2).all().all()   
# True

However, == doesn't assess two missing values as equal to each other.但是, ==不会将两个缺失值评估为彼此相等。

My question我的问题

Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type?有没有一种优雅的方法来处理缺失值相等,同时不强制执行相同的整数类型? The best I can come up with is:我能想到的最好的是:

(df1.fillna(0) == df2.fillna(0)).all().all()

but there has to be a more concise and less hacky way to deal with this problem.但是必须有一种更简洁、更简洁的方法来处理这个问题。

My follow up, opinion-based question: Would you consider this a bug?我的后续基于意见的问题:您认为这是一个错误吗?

If you think of this as a decimal problem (ie does 2 equal 2) then this perhaps looks like a bug.如果您认为这是一个小数问题(即 2 是否等于 2),那么这可能看起来像是一个错误。 However, if you look at it from how the interpreter sees it (ie does 00000010 equal 0000000000000010) then it becomes plain that there is indeed a difference.但是,如果您从解释器的角度来看它(即 00000010 是否等于 0000000000000010),那么很明显确实存在差异。 Bitwise operations.按位运算。

From a validation perspective, it is probably a good idea to make sure you are comparing apples to apples and so I like the answer of @Ben.T:从验证的角度来看,确保将苹果与苹果进行比较可能是一个好主意,因此我喜欢@Ben.T 的答案:

df1.equals(df2.astype(df1.dtypes))

Is this a bug?这是一个错误吗? That is above my pay grade.这高于我的工资等级。 You can submit it, and the thinkers surrounding the pandas library can make a decision.你可以提交,pandas 库周围的思考者可以做出决定。 It does seem odd that the '==' operator gives different results that the '.equals' function and that may sway the decision. '==' 运算符给出与 '.equals' 函数不同的结果,这似乎很奇怪,这可能会影响决定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM