[英]Evaluating equality of sorted pandas dataframes does not behave as expected
I would like to compare two pd.dataframes
for equality: 我想比较两个pd.dataframes
是否相等:
foo = pd.DataFrame([['between', 1.5], ['between', 2],
['between', 2.0], ['within', 2.0]],
columns=['Group', 'Distance'])
bar = pd.DataFrame([['between', 2], ['between', 1.5],
['within', 2.0], ['between', 2.0]],
columns=['Group', 'Distance'])
As far as I am concerned these two dataframes are identical, however I realize pandas does not agree because they are not in the same order. 就我而言,这两个数据帧是相同的,但是我意识到熊猫并不同意,因为它们的顺序不相同。 My thought was that I could sort and then reindex 我的想法是我可以排序然后重新索引
foo = foo.sort_values('Distance').reset_index(drop=True)
bar = bar.sort_values('Distance').reset_index(drop=True)
Pandas sort gives different results because of the initial ordering of the dataframes. 由于数据帧的初始顺序,熊猫排序给出了不同的结果。 And in fact they don't evaluate as being equivalent: 实际上,它们并不等同:
foo.equals(bar)
False
I could first sort on Group
and then on Distance
and this would return True
, however in dealing with larger dataframes I'm concerned about having to explicitly define sorting rules each time. 我可以首先在Group
上排序,然后在Distance
上排序,这将返回True
,但是在处理更大的数据帧时,我担心每次必须明确定义排序规则。 Is there a better way of comparing two differently ordered dataframes? 有没有比较两个不同顺序的数据帧的更好方法?
This way you can make them evaluate to True
: 这样,您可以使他们的评估结果为True
:
foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True).equals(bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True))
Or 要么
foo = foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
bar = bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
foo.equals(bar)
True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.