简体   繁体   中英

Python Sorting Pandas dataframe results in random results

I am observing some random sort results for a dataframe that I intend to sort by dates in ascending order. For multiple runs, most of the runs returns the correct results but for a small number of runs, it returns an incorrect results.

            records_df = records_df.groupby(['YEAR','QUARTER','SUPPLIER_ID']).TRANSACTION_DATES.agg({'TRANSACTION_DATES' : lambda x: list(x.unique())}).reset_index()
            # This now sorts in date order
            records_df.sort_values(by=['TRANSACTION_DATES'])

For most runs: TRANSACTION_DATES: [05-Sep-17, 06-Sep-17, 07-Sep-17]

For some runs: Incorrect results is seen:
TRANSACTION_DATES: [06-Sep-17, 07-Sep-17, 05-Sep-17]

Why is that so since I am already enforcing a sort using sort_values?

I think your problem is that you are using sort_values without assigning or using the inplace argument. This means that your sorted dataframe is just disappearing and is not stored anywhere.

So try:

records_df = records_df.sort_values(by=['TRANSACTION_DATES'])

or

records_df.sort_values(by=['TRANSACTION_DATES'], inplace=True)

For reference, the sort_values docs:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM