简体   繁体   中英

Sorting a pandas DataFrame by one level of a MultiIndex with a "key"

My question is basically the same as the one here: Sorting a pandas DataFrame by one level of a MultiIndex

id est, I want to sort a MultiIndex dataframe along one level, BUT I am facing the problem that the following index: ["foo2","foo1","foo10"] is sorted in ["foo1","foo10","foo2"] instead of ["foo1","foo2","foo10"] and I cannot pass a "key" argument like for the list.sort() function (see example below). How should I manage that? Should I reset_index, sort the column, and then set the index again?

import pandas as pd
import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [atoi(c) for c in re.split('(\d+)',text)]

# example on a list
L1=["foo2","foo1","foo10"]
print(sorted(L1))
print(sorted(L1,key=natural_keys))
print()

df = pd.DataFrame([{'I1':'foo2','I2':'b','val':2},{'I1':'foo1','I2':'a','val':1},{'I1':'foo10','I2':'c','val':3}])
df = df.set_index(['I1','I2'])
sorted_df = df.sort_index(level=0)
print(sorted_df)
print()

expected_df = pd.DataFrame([{'I1':'foo1','I2':'a','val':1},{'I1':'foo2','I2':'b','val':2},{'I1':'foo10','I2':'c','val':3}])
expected_df = expected_df.set_index(['I1','I2'])
print(expected_df)
          val
I1    I2
foo1  a     1
foo10 c     3
foo2  b     2

EXPECTED DF:
          val
I1    I2
foo1  a     1
foo2  b     2
foo10 c     3

Thanks

As explained by Jon Clements, if you are on a version of pandas >= 1.0.0 you can use the key argument of sort index. but if you also want to discriminate between several numbers in your index: foo_1_bar_2 foo_2_bar_1 in this order then you need to combine several function:

import pandas as pd
import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [atoi(c) for c in re.split('(\d+)',text)]

def sort_index(index):
    return [sorted(index,key=natural_keys,reverse=False).index(val) for val in index]

df = pd.DataFrame([{'I1':'foo2','I2':'b','val':2},{'I1':'foo1','I2':'a','val':1},{'I1':'foo10','I2':'c','val':3}])
df = df.set_index(['I1','I2'])
sorted_df=df.sort_index(level=0,key=sort_index)

I have not found any simple solution on previous version of pandas

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM