简体   繁体   English

使用 Pandas 对数据透视表中的列进行排序

[英]Sort columns in Pivot table with Pandas

folks.伙计们。 I've been through all the questions related to sorting columns with pivot tables but couldn't find something just as I needed.我已经解决了与使用数据透视表对列进行排序相关的所有问题,但找不到我需要的东西。 I have a dataframe of this kind:我有一个这样的数据框:

        Date  Moisture     Accum  Year  DayOfYear
0 2000-01-01  0.408640  0.408640  2000          1
1 2000-01-02  0.433425  0.842065  2000          2
2 2000-01-03  0.429745  1.271810  2000          3
3 2000-01-04  0.427589  1.699399  2000          4
4 2000-01-05  0.428700  2.128098  2000          5

And I created a Pivot table from it and calculated another column from the existing data:我从中创建了一个数据透视表,并根据现有数据计算了另一列:

mean1 = pd.pivot_table(c1, index = 'DayOfYear', columns = 'Year', values = 'Moisture')
mean1['Mean'] = mean1.mean(axis = 1)

I obtained something like this:我得到了这样的东西:

Year           2000      2001      2002  ...      2018      2019      Mean
DayOfYear                                ...                              
1          0.408640  0.433016  0.420326  ...  0.423164  0.328385  0.401896
2          0.433425  0.423607  0.414502  ...  0.419587  0.322804  0.398434
3          0.429745  0.418132  0.404171  ...  0.417384  0.318795  0.396913
4          0.427589  0.407190  0.394478  ...  0.420361  0.316989  0.398425
5          0.428700  0.401072  0.386432  ...  0.417026  0.313664  0.396777

I want to sort the values for each year, but I haven't been able to make it happen.我想对每年的值进行排序,但我一直无法实现。 I've tried this:我试过这个:

mean1 = mean1.sort_values('2000', ascending = True, axis = 0)

But I get KeyError: '2000' .但我得到KeyError: '2000' I also tried sorting by the value I made the pivot table from ('Moisture'), as responses to other questions recommended, but it keeps showing a similar error.我还尝试按我从 ('Moisture') 制作数据透视表的值进行排序,作为对其他问题的建议的回答,但它一直显示类似的错误。 If I try to sort the values from the 'Mean' column I do get the sorted column, but it can't be done (apparently) for the year columns (eg '2000').如果我尝试对“均值”列中的值进行排序,我确实会得到已排序的列,但对于年份列(例如“2000”),它(显然)无法完成。 What am I missing?我错过了什么?

Year column in your source DataFrame is most likely of int type, so the respective column in the pivot table has also "integer" (not "string") name (run mean1.info() to check column types).源 DataFrame 中的Year列很可能是int类型,因此数据透视表中的相应列也具有“整数”(而不是“字符串”)名称(运行mean1.info()以检查列类型)。

So the first, mandatory correction is to change the first parameter to integer 2000 .因此,第一个强制更正是将第一个参数更改为整数2000

Two another correction are in my opinion advisable, but not required: Default values of ascending and axis parameters are True and 0 , respectively, so if you want to keep your code shorter, you can omit them.在我看来,另外两个更正是可取的,但不是必需的:升序参数的默认值分别是True0 ,所以如果你想保持你的代码更短,你可以省略它们。

So replace the offending line with:因此,将违规行替换为:

mean1 = mean1.sort_values(2000)

If you want to sort every column independently, you may try this solution , which is the same as:如果你想对每一列独立排序,你可以试试这个解决方案,它与:

import numpy as np

mean2 = pd.DataFrame(np.sort(mean1.values, axis=0), index=mean1.index, columns=mean1.columns)

Which will give you a DataFrame with each column sorted individually.这将为您提供一个 DataFrame,其中每一列都单独排序。

Year           2000      2001      2002      2018      2019      Mean
DayOfYear                                                            
1          0.408640  0.401072  0.386432  0.417026  0.313664  0.389379
2          0.427589  0.407190  0.394478  0.417384  0.316989  0.393321
3          0.428700  0.418132  0.404171  0.419587  0.318795  0.397645
4          0.429745  0.423607  0.414502  0.420361  0.322804  0.402706
5          0.433425  0.433016  0.420326  0.423164  0.328385  0.402785

But now the index doesn't make sense at all, since all cells were reordered.但是现在索引根本没有意义,因为所有单元格都被重新排序。 So maybe you'll want to reindex it.所以也许你想重新索引它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM