简体   繁体   中英

pandas: How to reindex MultiIndex level?

How do I renumber a MultiIndex level after sorting by one of the levels? Here is the DataFrame after sorting:

+--------+---+------+
|        |   | text |
+--------+---+------+
| letter |   |      |
+--------+---+------+
| a      | 0 | blah |
+--------+---+------+
|        | 3 | blah |
+--------+---+------+
|        | 6 | blah |
+--------+---+------+
| b      | 1 | blah |
+--------+---+------+
|        | 4 | blah |
+--------+---+------+
|        | 7 | blah |
+--------+---+------+
| c      | 2 | blah |
+--------+---+------+
|        | 5 | blah |
+--------+---+------+
|        | 8 | blah |
+--------+---+------+

And here is what I want (but possibly leaving original index in its own column):

+--------+---+------+
|        |   | text |
+--------+---+------+
| letter |   |      |
+--------+---+------+
| a      | 0 | blah |
+--------+---+------+
|        | 1 | blah |
+--------+---+------+
|        | 2 | blah |
+--------+---+------+
| b      | 0 | blah |
+--------+---+------+
|        | 1 | blah |
+--------+---+------+
|        | 2 | blah |
+--------+---+------+
| c      | 0 | blah |
+--------+---+------+
|        | 1 | blah |
+--------+---+------+
|        | 2 | blah |
+--------+---+------+

I've tried searching for an answer, tried coding different things, but I'm stumped.

Code to reproduce the first table above:

import pandas as pd
df = pd.DataFrame({'letter': ['a', 'b', 'c'] * 3, 'text': ['blah'] * 9})
df.set_index(keys='letter', append=True, inplace=True)
df = df.reorder_levels(order=[1, 0])
df.sort_index(level=0, inplace=True)
print(df)

You can check cumcount

df=df.assign(yourindex=df.groupby('letter').cumcount()).set_index(['letter','yourindex']).sort_index(level=[0,1])
df
Out[861]: 
                  text
letter yourindex      
a      0          blah
       1          blah
       2          blah
b      0          blah
       1          blah
       2          blah
c      0          blah
       1          blah
       2          blah

Here's what I did:

df["new_index"] = df.groupby("letter").cumcount()
df

This gives you:

          text  new_index
letter                   
a      0  blah          0
       3  blah          1
       6  blah          2
b      1  blah          0
       4  blah          1
       7  blah          2
c      2  blah          0
       5  blah          1
       8  blah          2

Then, you can reset the index:

df.reset_index().set_index(["letter","new_index"])

                  level_1  text
letter new_index               
a      0                0  blah
       1                3  blah
       2                6  blah
b      0                1  blah
       1                4  blah
       2                7  blah
c      0                2  blah
       1                5  blah
       2                8  blah

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM