簡體   English   中英

如何使用自定義鍵功能對 pandas MultiIndex 的多個級別進行排序?

[英]How can I sort multiple levels of a pandas MultiIndex with custom key functions?

假設我有一個帶有 MultiIndex 的 dataframe,如下所示:

import pandas as pd
import numpy as np

my_index = pd.MultiIndex.from_product(
    [(3,1,2), ("small", "tall", "medium"), ("B", "A", "C")],
    names=["number", "size", "letter"]
)

df_0 = pd.DataFrame(np.random.rand(27, 2), columns=["x", "y"], index=my_index)
                             x         y
number size   letter
3      small  B       0.950073  0.599918
              A       0.014450  0.472736
              C       0.208064  0.778538
       tall   B       0.979631  0.367234
              A       0.832459  0.449875
              C       0.761929  0.053144
       medium B       0.460764  0.800131
              A       0.355746  0.573813
              C       0.078924  0.058865
1      small  B       0.405209  0.354636
              A       0.536242  0.012904
              C       0.458910  0.723627
       tall   B       0.859898  0.442954
              A       0.109729  0.885598
              C       0.378363  0.220695
       medium B       0.652191  0.685181
              A       0.503525  0.400973
              C       0.454671  0.188798
2      small  B       0.407654  0.168782
              A       0.393451  0.083023
              C       0.073432  0.165209
       tall   B       0.678226  0.108497
              A       0.718348  0.077935
              C       0.595500  0.146271
       medium B       0.719985  0.422167
              A       0.950950  0.532390
              C       0.687721  0.920229

現在我想按不同級別對索引進行排序,首先是數字,然后是大小,最后是字母。

如果我這樣做...

df_1 = df_0.sort_index(level=["number", "size", "letter"], inplace=False)

...當然大小按字母順序排序。

                             x         y
number size   letter
1      medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
       small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
       tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
       small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
       tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
       small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
       tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

但我希望它按自定義鍵排序。 我知道我可以使用自定義排序 function 對大小級別進行排序,如下所示:

custom_key = np.vectorize(lambda x: {"small": 0, "medium": 1, "tall": 2}[x])

df_2 = df_0.sort_index(level=1, key=custom_key, inplace=False)
                             x         y
number size   letter
1      small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
2      small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
3      small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
1      medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
2      medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
3      medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
1      tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

但是我怎樣才能像 df_1 一樣按所有級別排序並在第二級使用自定義鍵? 預計 output:

                             x         y
number size   letter
1      small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
       medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
       tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
       medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
       tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
       medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
       tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

我應該如何定義自定義鍵 function,以便我也可以像這樣通過名稱訪問 sort_index 中的級別?

df_3 = df_0.sort_index(level="size", key=custom_key, inplace=False)

在這里,它給出了一個 KeyError: 'Level size not found'

理想的情況是使用有序的Categorical數據。

否則,您可以使用基於級別名稱的自定義映射器:

# define here custom sorters
# all other levels will be sorted by default order
order = {'size': ['small', 'medium', 'tall']}

def sorter(s):
    if s.name in order:
        return s.map({k:v for v,k in enumerate(order[s.name])})
    return s

out = df_0.sort_index(level=["number", "size", "letter"], key=sorter)

Output:

                             x         y
number size   letter                    
1      small  A       0.530753  0.687982
              B       0.722848  0.974920
              C       0.174058  0.695016
       medium A       0.397016  0.550404
              B       0.426989  0.843007
              C       0.929218  0.497728
       tall   A       0.159078  0.005675
              B       0.917871  0.384265
              C       0.685435  0.585242
2      small  A       0.423254  0.838356
              B       0.342158  0.209632
...

pandas.CategoricalIndex的解決方案:

categories_order = ['small', 'medium', 'tall']
categories = pd.CategoricalIndex(df_0.index.levels[1].values,
                                 categories=categories_order,
                                 ordered=True)

df_0.index = df_0.index.set_levels(categories, level=1)

out = df_0.sort_index()

#Output:

print(out)

                             x         y
number size   letter                    
1      small  A       0.994537  0.984409
              B       0.153568  0.529762
              C       0.025857  0.750508
       medium A       0.977558  0.488506
              B       0.727121  0.189481
              C       0.465571  0.606248
       tall   A       0.638590  0.530700
              B       0.181290  0.963911
              C       0.068123  0.911834
2      small  A       0.183484  0.459770
              B       0.587073  0.935522
              C       0.068233  0.135137
       medium A       0.315199  0.047945
              B       0.568500  0.756994
              C       0.581868  0.725463
       tall   A       0.881304  0.739188
              B       0.566962  0.855521
              C       0.677635  0.851877
3      small  A       0.497159  0.804575
              B       0.290587  0.104613
              C       0.343545  0.468724
       medium A       0.868754  0.526125
              B       0.759151  0.439089
              C       0.515183  0.266913
       tall   A       0.486962  0.178156
              B       0.902421  0.133138
              C       0.613683  0.827321

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM