簡體   English   中英

如何對多索引列月份名稱進行排序?

[英]How to sort multiindex column month names?

我有這個多索引df

                       YEARS_TMAX TMAX YEARS_TMAX TMAX  YEARS_TMAX
MONTH                       April April    August August  December .....
CODE   NAME                                                   
000130 RICA PLAYA          21.0  31.5      21.0   21.5      22.0
000132 PUERTO PIZARRO      12.0  33.8      12.0   32.4      11.0
000134 PAPAYAL             23.0  33.2      22.0   22.4      21.0
000135 EL SALTO            22.0  33.6      23.0   22.8      22.0
000136 CAÑAVERAL           16.0  32.7      15.0   33.1      11.0
                        ...   ...       ...    ...       ...
158317 SUSAPAYA            19.0  17.6      19.0   17.3      21.0
158321 PALCA               16.0  19.3      17.0   19.8      16.0
158323 TALABAYA            12.0  17.6      13.0   17.5      13.0
158326 CAPAZO              17.0  13.6      17.0   13.0      19.0
158328 PAUCARANI           14.0  13.3      13.0   11.9      15.0

我想按月份名稱(首先是 TMAX 列)對列進行排序,如下所示:

                           TMAX YEARS_TMAX TMAX YEARS_TMAX  TMAX
MONTH                      January January February February March .....
CODE   NAME                                                   
000130 RICA PLAYA          22.0  31.5      23.0   27.5      23.0
000132 PUERTO PIZARRO      17.0  32.8      18.0   30.4      18.0
000134 PAPAYAL             25.0  32.2      26.0   28.4      25.0
000135 EL SALTO            26.0  31.6      26.0   26.8      26.0
000136 CAÑAVERAL           16.0  32.7      18.0   31.1      15.0
                        ...   ...       ...    ...       ...
158317 SUSAPAYA            19.0  17.6      19.0   17.3      21.0
158321 PALCA               16.0  19.3      17.0   19.8      16.0
158323 TALABAYA            12.0  17.6      13.0   17.5      13.0
158326 CAPAZO              17.0  13.6      17.0   13.0      19.0
158328 PAUCARANI           14.0  13.3      13.0   11.9      15.0

所以我寫了這個代碼:來源: 在多索引中排序“日期”

dates = pd.to_datetime(df.columns.get_level_values(1), format='%B')
df.columns = [df.columns.get_level_values(0), dates]
df = df.sort_index(axis=1, level=1)

要按月份對列進行排序但dates不是創建月份名稱, dates是創建隨機日期。 我該如何解決這個問題?

提前致謝。

通過從calendar.month_name創建有序 dtype 來使用CategoricalDtype這將確保按排序正確排序。

month_dtype = pd.CategoricalDtype(categories=list(month_name), ordered=True)
df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).astype(month_dtype)]
df = df.sort_index(axis=1, level=[1, 0])

示例數據和導入:

from calendar import month_name

import pandas as pd

df = pd.DataFrame(
    [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]],
    columns=pd.MultiIndex.from_product([
        ['YEARS_TMAX', 'TMAX'],
        ['March', 'January', 'February']
    ])
)

df排序前:

  YEARS_TMAX                   TMAX                 
       March January February March January February
0          1       2        3     4       5        6
1          7       8        9    10      11       12

df排序后:

     TMAX YEARS_TMAX     TMAX YEARS_TMAX  TMAX YEARS_TMAX
  January    January February   February March      March
0       5          2        6          3     4          1
1      11          8       12          9    10          7

datetime 方法也可以,但需要使用DatetimeIndex.strftime轉換回字符串:

df.columns = [df.columns.get_level_values(0),
              pd.to_datetime(df.columns.get_level_values(1), format='%B')]
df = df.sort_index(axis=1, level=[1, 0])

# convert back to strings
df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).strftime('%B')]

df

     TMAX YEARS_TMAX     TMAX YEARS_TMAX  TMAX YEARS_TMAX
  January    January February   February March      March
0       5          2        6          3     4          1
1      11          8       12          9    10          7

這種方法的缺點是級別 1 再次是一個字符串類型,它需要在任何需要更改排序的時間進行轉換,因為不希望按字典序排序。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM