简体   繁体   中英

How to reindex with MultiIndex?

I've got a DataFrame like this:

import pandas as pd
df = pd.DataFrame.from_dict({'var1': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0},
 'var2': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0},
 'var3': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0},
 'var4': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0}})

And I'd like to fill the missing indices, so I used .reindex first:

df.reindex(np.arange(1, 11))

And I got:

    var1    var2    var3    var4
1   0.0     0.0     0.0     0.0
2   0.0     0.0     0.0     0.0
3   0.0     0.0     0.0     0.0
4   0.0     0.0     0.0     0.0
5   NaN     NaN     NaN     NaN
6   0.0     0.0     0.0     0.0
7   0.0     0.0     0.0     0.0
8   0.0     0.0     0.0     0.0
9   NaN     NaN     NaN     NaN
10  0.0     0.0     0.0     0.0

However, I need to keep track of multiple indices and when I tried to construct MultiIndex and pass it to .reindex it didn't work as I was expecting it to:

    df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]))

        var1    var2    var3    var4
A   1   NaN     NaN     NaN     NaN
    2   NaN     NaN     NaN     NaN
    3   NaN     NaN     NaN     NaN
    4   NaN     NaN     NaN     NaN
    5   NaN     NaN     NaN     NaN
    6   NaN     NaN     NaN     NaN
    7   NaN     NaN     NaN     NaN
    8   NaN     NaN     NaN     NaN
    9   NaN     NaN     NaN     NaN
   10   NaN     NaN     NaN     NaN

I can't really understand what's going on here and the documentation of .reindex is not quite clear to me. Can someone advise me on this and tell why MultiIndex can't be passed to .reindex or what am I doing wrong?

@Edit:

@jazrael provided a good solution when we move from 1-level to 2-level MultiIndex. However, what about a case when we want to reindex from 2-level MultiIndex to 3-level MultiIndex?

Eg:

df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index])

        var1    var2    var3    var4
1   0   0.0     0.0     0.0     0.0
    1   0.0     0.0     0.0     0.0
    2   0.0     0.0     0.0     0.0
    3   0.0     0.0     0.0     0.0
2   4   0.0     0.0     0.0     0.0
    6   0.0     0.0     0.0     0.0
    7   0.0     0.0     0.0     0.0
    8   0.0     0.0     0.0     0.0
   10   0.0     0.0     0.0     0.0

And I'd like to get:

            var1    var2    var3    var4
A   1   0   0.0     0.0     0.0     0.0
        1   0.0     0.0     0.0     0.0
        2   0.0     0.0     0.0     0.0
        3   0.0     0.0     0.0     0.0
    2   4   0.0     0.0     0.0     0.0
        5   NaN     NaN     NaN     NaN
        6   0.0     0.0     0.0     0.0
        7   0.0     0.0     0.0     0.0
        8   0.0     0.0     0.0     0.0
        9   NaN     NaN     NaN     NaN
       10   0.0     0.0     0.0     0.0

Because want use reindex for simple, not MultiIndex index is necessary set level=1 for match second level of new MultiIndex :

df = df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]), level=1)
print (df)
      var1  var2  var3  var4
A 1    0.0   0.0   0.0   0.0
  2    0.0   0.0   0.0   0.0
  3    0.0   0.0   0.0   0.0
  4    0.0   0.0   0.0   0.0
  5    NaN   NaN   NaN   NaN
  6    0.0   0.0   0.0   0.0
  7    0.0   0.0   0.0   0.0
  8    0.0   0.0   0.0   0.0
  9    NaN   NaN   NaN   NaN
  10   0.0   0.0   0.0   0.0
  

You can create a new index with the extra level and perform an explicit DataFrame join to get what you want.

df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index], names=["key1", "key2"])
# If df's index is already created, do df.rename_axis(["key1", "key2"], inplace=True)

new_index = pd.MultiIndex.from_arrays([['A']*11, np.repeat([1, 2], [4, 7]), range(11)],
                                       names=["new_key", *df.index.names])
output = pd.DataFrame([], index=new_index).join(df, on=df.index.names)  # Join on overlapped index levels based on names

Output:

                   var1  var2  var3  var4
new_key key1 key2                        
A       1    0      0.0   0.0   0.0   0.0
             1      0.0   0.0   0.0   0.0
             2      0.0   0.0   0.0   0.0
             3      0.0   0.0   0.0   0.0
        2    4      0.0   0.0   0.0   0.0
             5      NaN   NaN   NaN   NaN
             6      0.0   0.0   0.0   0.0
             7      0.0   0.0   0.0   0.0
             8      0.0   0.0   0.0   0.0
             9      NaN   NaN   NaN   NaN
             10     0.0   0.0   0.0   0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM