简体   繁体   English

如何使用 MultiIndex 重新索引?

[英]How to reindex with MultiIndex?

I've got a DataFrame like this:我有一个这样的 DataFrame:

import pandas as pd
df = pd.DataFrame.from_dict({'var1': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0},
 'var2': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0},
 'var3': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0},
 'var4': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  6: 0.0,
  7: 0.0,
  8: 0.0,
  10: 0.0}})

And I'd like to fill the missing indices, so I used .reindex first:我想填补缺失的索引,所以我首先使用了.reindex

df.reindex(np.arange(1, 11))

And I got:我得到了:

    var1    var2    var3    var4
1   0.0     0.0     0.0     0.0
2   0.0     0.0     0.0     0.0
3   0.0     0.0     0.0     0.0
4   0.0     0.0     0.0     0.0
5   NaN     NaN     NaN     NaN
6   0.0     0.0     0.0     0.0
7   0.0     0.0     0.0     0.0
8   0.0     0.0     0.0     0.0
9   NaN     NaN     NaN     NaN
10  0.0     0.0     0.0     0.0

However, I need to keep track of multiple indices and when I tried to construct MultiIndex and pass it to .reindex it didn't work as I was expecting it to:但是,我需要跟踪多个索引,当我尝试构造 MultiIndex 并将其传递给.reindex它并没有像我期望的那样工作:

    df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]))

        var1    var2    var3    var4
A   1   NaN     NaN     NaN     NaN
    2   NaN     NaN     NaN     NaN
    3   NaN     NaN     NaN     NaN
    4   NaN     NaN     NaN     NaN
    5   NaN     NaN     NaN     NaN
    6   NaN     NaN     NaN     NaN
    7   NaN     NaN     NaN     NaN
    8   NaN     NaN     NaN     NaN
    9   NaN     NaN     NaN     NaN
   10   NaN     NaN     NaN     NaN

I can't really understand what's going on here and the documentation of .reindex is not quite clear to me.我真的不明白这里发生了什么, .reindex的文档对.reindex来说也不是很清楚。 Can someone advise me on this and tell why MultiIndex can't be passed to .reindex or what am I doing wrong?有人可以就此给我建议并告诉我为什么 MultiIndex 不能传递给.reindex或者我做错了什么?

@Edit: @编辑:

@jazrael provided a good solution when we move from 1-level to 2-level MultiIndex. @jazrael当我们从 1-level 移动到 2-level MultiIndex 时提供了一个很好的解决方案。 However, what about a case when we want to reindex from 2-level MultiIndex to 3-level MultiIndex?但是,当我们想要从 2 级 MultiIndex 重新索引到 3 级 MultiIndex 时,该怎么办?

Eg:例如:

df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index])

        var1    var2    var3    var4
1   0   0.0     0.0     0.0     0.0
    1   0.0     0.0     0.0     0.0
    2   0.0     0.0     0.0     0.0
    3   0.0     0.0     0.0     0.0
2   4   0.0     0.0     0.0     0.0
    6   0.0     0.0     0.0     0.0
    7   0.0     0.0     0.0     0.0
    8   0.0     0.0     0.0     0.0
   10   0.0     0.0     0.0     0.0

And I'd like to get:我想得到:

            var1    var2    var3    var4
A   1   0   0.0     0.0     0.0     0.0
        1   0.0     0.0     0.0     0.0
        2   0.0     0.0     0.0     0.0
        3   0.0     0.0     0.0     0.0
    2   4   0.0     0.0     0.0     0.0
        5   NaN     NaN     NaN     NaN
        6   0.0     0.0     0.0     0.0
        7   0.0     0.0     0.0     0.0
        8   0.0     0.0     0.0     0.0
        9   NaN     NaN     NaN     NaN
       10   0.0     0.0     0.0     0.0

Because want use reindex for simple, not MultiIndex index is necessary set level=1 for match second level of new MultiIndex :因为想要简单地使用reindex ,而不是MultiIndex索引是必要的,设置level=1以匹配新MultiIndex第二级:

df = df.reindex(pd.MultiIndex.from_product([["A"], np.arange(1, 11)]), level=1)
print (df)
      var1  var2  var3  var4
A 1    0.0   0.0   0.0   0.0
  2    0.0   0.0   0.0   0.0
  3    0.0   0.0   0.0   0.0
  4    0.0   0.0   0.0   0.0
  5    NaN   NaN   NaN   NaN
  6    0.0   0.0   0.0   0.0
  7    0.0   0.0   0.0   0.0
  8    0.0   0.0   0.0   0.0
  9    NaN   NaN   NaN   NaN
  10   0.0   0.0   0.0   0.0
  

You can create a new index with the extra level and perform an explicit DataFrame join to get what you want.您可以创建具有额外级别的新索引并执行显式 DataFrame 连接以获得您想要的。

df.index = pd.MultiIndex.from_arrays([np.repeat([1, 2], [4, 5]), df.index], names=["key1", "key2"])
# If df's index is already created, do df.rename_axis(["key1", "key2"], inplace=True)

new_index = pd.MultiIndex.from_arrays([['A']*11, np.repeat([1, 2], [4, 7]), range(11)],
                                       names=["new_key", *df.index.names])
output = pd.DataFrame([], index=new_index).join(df, on=df.index.names)  # Join on overlapped index levels based on names

Output:输出:

                   var1  var2  var3  var4
new_key key1 key2                        
A       1    0      0.0   0.0   0.0   0.0
             1      0.0   0.0   0.0   0.0
             2      0.0   0.0   0.0   0.0
             3      0.0   0.0   0.0   0.0
        2    4      0.0   0.0   0.0   0.0
             5      NaN   NaN   NaN   NaN
             6      0.0   0.0   0.0   0.0
             7      0.0   0.0   0.0   0.0
             8      0.0   0.0   0.0   0.0
             9      NaN   NaN   NaN   NaN
             10     0.0   0.0   0.0   0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM