评估排序的熊猫数据框的相等性不符合预期

Question

I would like to compare two pd.dataframes for equality: 我想比较两个pd.dataframes是否相等：

foo = pd.DataFrame([['between', 1.5], ['between', 2], 
                    ['between', 2.0], ['within', 2.0]], 
                   columns=['Group', 'Distance'])

bar = pd.DataFrame([['between', 2], ['between', 1.5], 
                    ['within', 2.0], ['between', 2.0]], 
                   columns=['Group', 'Distance'])

As far as I am concerned these two dataframes are identical, however I realize pandas does not agree because they are not in the same order. 就我而言，这两个数据帧是相同的，但是我意识到熊猫并不同意，因为它们的顺序不相同。 My thought was that I could sort and then reindex 我的想法是我可以排序然后重新索引

foo = foo.sort_values('Distance').reset_index(drop=True)
bar = bar.sort_values('Distance').reset_index(drop=True)

Pandas sort gives different results because of the initial ordering of the dataframes. 由于数据帧的初始顺序，熊猫排序给出了不同的结果。 And in fact they don't evaluate as being equivalent: 实际上，它们并不等同：

foo.equals(bar)
False

I could first sort on Group and then on Distance and this would return True , however in dealing with larger dataframes I'm concerned about having to explicitly define sorting rules each time. 我可以首先在Group上排序，然后在Distance上排序，这将返回True ，但是在处理更大的数据帧时，我担心每次必须明确定义排序规则。 Is there a better way of comparing two differently ordered dataframes? 有没有比较两个不同顺序的数据帧的更好方法？

Answer 1

This way you can make them evaluate to True : 这样，您可以使他们的评估结果为True ：

foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True).equals(bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True))

Or 要么

foo = foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
bar = bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
foo.equals(bar)
True

评估排序的熊猫数据框的相等性不符合预期

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-04-04 21:49:51

评估排序的熊猫数据框的相等性不符合预期

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-04-04 21:49:51

解决方案1
2 已采纳 2017-04-04 21:49:51