简体   繁体   中英

Evaluating equality of sorted pandas dataframes does not behave as expected

I would like to compare two pd.dataframes for equality:

foo = pd.DataFrame([['between', 1.5], ['between', 2], 
                    ['between', 2.0], ['within', 2.0]], 
                   columns=['Group', 'Distance'])

bar = pd.DataFrame([['between', 2], ['between', 1.5], 
                    ['within', 2.0], ['between', 2.0]], 
                   columns=['Group', 'Distance'])

As far as I am concerned these two dataframes are identical, however I realize pandas does not agree because they are not in the same order. My thought was that I could sort and then reindex

foo = foo.sort_values('Distance').reset_index(drop=True)
bar = bar.sort_values('Distance').reset_index(drop=True)

Pandas sort gives different results because of the initial ordering of the dataframes. And in fact they don't evaluate as being equivalent:

foo.equals(bar)
False

I could first sort on Group and then on Distance and this would return True , however in dealing with larger dataframes I'm concerned about having to explicitly define sorting rules each time. Is there a better way of comparing two differently ordered dataframes?

This way you can make them evaluate to True :

foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True).equals(bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True))

Or

foo = foo.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
bar = bar.sort_values(foo.columns.values.tolist()).reset_index(drop=True)
foo.equals(bar)
True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM