简体   繁体   中英

Sort columns in Pivot table with Pandas

folks. I've been through all the questions related to sorting columns with pivot tables but couldn't find something just as I needed. I have a dataframe of this kind:

        Date  Moisture     Accum  Year  DayOfYear
0 2000-01-01  0.408640  0.408640  2000          1
1 2000-01-02  0.433425  0.842065  2000          2
2 2000-01-03  0.429745  1.271810  2000          3
3 2000-01-04  0.427589  1.699399  2000          4
4 2000-01-05  0.428700  2.128098  2000          5

And I created a Pivot table from it and calculated another column from the existing data:

mean1 = pd.pivot_table(c1, index = 'DayOfYear', columns = 'Year', values = 'Moisture')
mean1['Mean'] = mean1.mean(axis = 1)

I obtained something like this:

Year           2000      2001      2002  ...      2018      2019      Mean
DayOfYear                                ...                              
1          0.408640  0.433016  0.420326  ...  0.423164  0.328385  0.401896
2          0.433425  0.423607  0.414502  ...  0.419587  0.322804  0.398434
3          0.429745  0.418132  0.404171  ...  0.417384  0.318795  0.396913
4          0.427589  0.407190  0.394478  ...  0.420361  0.316989  0.398425
5          0.428700  0.401072  0.386432  ...  0.417026  0.313664  0.396777

I want to sort the values for each year, but I haven't been able to make it happen. I've tried this:

mean1 = mean1.sort_values('2000', ascending = True, axis = 0)

But I get KeyError: '2000' . I also tried sorting by the value I made the pivot table from ('Moisture'), as responses to other questions recommended, but it keeps showing a similar error. If I try to sort the values from the 'Mean' column I do get the sorted column, but it can't be done (apparently) for the year columns (eg '2000'). What am I missing?

Year column in your source DataFrame is most likely of int type, so the respective column in the pivot table has also "integer" (not "string") name (run mean1.info() to check column types).

So the first, mandatory correction is to change the first parameter to integer 2000 .

Two another correction are in my opinion advisable, but not required: Default values of ascending and axis parameters are True and 0 , respectively, so if you want to keep your code shorter, you can omit them.

So replace the offending line with:

mean1 = mean1.sort_values(2000)

If you want to sort every column independently, you may try this solution , which is the same as:

import numpy as np

mean2 = pd.DataFrame(np.sort(mean1.values, axis=0), index=mean1.index, columns=mean1.columns)

Which will give you a DataFrame with each column sorted individually.

Year           2000      2001      2002      2018      2019      Mean
DayOfYear                                                            
1          0.408640  0.401072  0.386432  0.417026  0.313664  0.389379
2          0.427589  0.407190  0.394478  0.417384  0.316989  0.393321
3          0.428700  0.418132  0.404171  0.419587  0.318795  0.397645
4          0.429745  0.423607  0.414502  0.420361  0.322804  0.402706
5          0.433425  0.433016  0.420326  0.423164  0.328385  0.402785

But now the index doesn't make sense at all, since all cells were reordered. So maybe you'll want to reindex it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM