繁体   English   中英

如何对多索引列月份名称进行排序?

[英]How to sort multiindex column month names?

我有这个多索引df

                       YEARS_TMAX TMAX YEARS_TMAX TMAX  YEARS_TMAX
MONTH                       April April    August August  December .....
CODE   NAME                                                   
000130 RICA PLAYA          21.0  31.5      21.0   21.5      22.0
000132 PUERTO PIZARRO      12.0  33.8      12.0   32.4      11.0
000134 PAPAYAL             23.0  33.2      22.0   22.4      21.0
000135 EL SALTO            22.0  33.6      23.0   22.8      22.0
000136 CAÑAVERAL           16.0  32.7      15.0   33.1      11.0
                        ...   ...       ...    ...       ...
158317 SUSAPAYA            19.0  17.6      19.0   17.3      21.0
158321 PALCA               16.0  19.3      17.0   19.8      16.0
158323 TALABAYA            12.0  17.6      13.0   17.5      13.0
158326 CAPAZO              17.0  13.6      17.0   13.0      19.0
158328 PAUCARANI           14.0  13.3      13.0   11.9      15.0

我想按月份名称(首先是 TMAX 列)对列进行排序,如下所示:

                           TMAX YEARS_TMAX TMAX YEARS_TMAX  TMAX
MONTH                      January January February February March .....
CODE   NAME                                                   
000130 RICA PLAYA          22.0  31.5      23.0   27.5      23.0
000132 PUERTO PIZARRO      17.0  32.8      18.0   30.4      18.0
000134 PAPAYAL             25.0  32.2      26.0   28.4      25.0
000135 EL SALTO            26.0  31.6      26.0   26.8      26.0
000136 CAÑAVERAL           16.0  32.7      18.0   31.1      15.0
                        ...   ...       ...    ...       ...
158317 SUSAPAYA            19.0  17.6      19.0   17.3      21.0
158321 PALCA               16.0  19.3      17.0   19.8      16.0
158323 TALABAYA            12.0  17.6      13.0   17.5      13.0
158326 CAPAZO              17.0  13.6      17.0   13.0      19.0
158328 PAUCARANI           14.0  13.3      13.0   11.9      15.0

所以我写了这个代码:来源: 在多索引中排序“日期”

dates = pd.to_datetime(df.columns.get_level_values(1), format='%B')
df.columns = [df.columns.get_level_values(0), dates]
df = df.sort_index(axis=1, level=1)

要按月份对列进行排序但dates不是创建月份名称, dates是创建随机日期。 我该如何解决这个问题?

提前致谢。

通过从calendar.month_name创建有序 dtype 来使用CategoricalDtype这将确保按排序正确排序。

month_dtype = pd.CategoricalDtype(categories=list(month_name), ordered=True)
df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).astype(month_dtype)]
df = df.sort_index(axis=1, level=[1, 0])

示例数据和导入:

from calendar import month_name

import pandas as pd

df = pd.DataFrame(
    [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]],
    columns=pd.MultiIndex.from_product([
        ['YEARS_TMAX', 'TMAX'],
        ['March', 'January', 'February']
    ])
)

df排序前:

  YEARS_TMAX                   TMAX                 
       March January February March January February
0          1       2        3     4       5        6
1          7       8        9    10      11       12

df排序后:

     TMAX YEARS_TMAX     TMAX YEARS_TMAX  TMAX YEARS_TMAX
  January    January February   February March      March
0       5          2        6          3     4          1
1      11          8       12          9    10          7

datetime 方法也可以,但需要使用DatetimeIndex.strftime转换回字符串:

df.columns = [df.columns.get_level_values(0),
              pd.to_datetime(df.columns.get_level_values(1), format='%B')]
df = df.sort_index(axis=1, level=[1, 0])

# convert back to strings
df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).strftime('%B')]

df

     TMAX YEARS_TMAX     TMAX YEARS_TMAX  TMAX YEARS_TMAX
  January    January February   February March      March
0       5          2        6          3     4          1
1      11          8       12          9    10          7

这种方法的缺点是级别 1 再次是一个字符串类型,它需要在任何需要更改排序的时间进行转换,因为不希望按字典序排序。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM