简体   繁体   English

熊猫:如何通过拆分从一个多索引级别向一个多索引添加级别?

[英]Pandas: How to add a level to a multiindex from one multiindex level by splitting?

How can I create a new level by splitting the second level at | 如何通过将第二个级别拆分为|来创建新级别| ?

The initial index: 初始索引:

在此处输入图片说明

MultiIndex(levels=[['A', 'B', 'C', 'D'], ['a|a_unit', 'b|b_unit', 'c|c_unit']],
       codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

Desired output: 所需的输出:

在此处输入图片说明

What I tried: 我试过的

# plan was to create a new column and use set_index
df.columns.to_frame().iloc[:,1].str.split('|')

EDIT: The reason why my approach did not work was the following: 编辑:我的方法不起作用的原因如下:

Initially, I had the values in level 1 of the index separated by ' | 最初,我在索引的第1级中的值由' |分隔 ' to make this example simpler, I deleted the * . 为了简化此示例,我删除了* Without the start everything worked well, but with the start, I got an re error: 如果没有启动一切运作良好,但一开始,我得到了一个re错误:

re.error: nothing to repeat at position 0

Having proper testcases is really tricky sometimes. 有时拥有适当的测试用例确实很棘手。

You can try with: 您可以尝试:

s=df.columns.to_frame().iloc[:,1].str.split('|')
final=(pd.DataFrame(data=df.values,columns=df.columns.get_level_values(0))
                   .T.set_index([s.str[0],s.str[1]],append=True).T)

Or: 要么:

final=(pd.DataFrame(columns=
 pd.MultiIndex.from_arrays([df.columns.get_level_values(0),s.str[0],s.str[1]])))

在此处输入图片说明

The answer by anky_91 is quite compact. anky_91的答案非常紧凑。 Here is another solution which also works with this index: 这是另一个与此索引配合使用的解决方案:

MultiIndex(levels=[['A', 'B', 'C', 'D'], ['a*|*a_unit', 'b*|*b_unit', 'c*|*c_unit']],
       codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

    #  clean up the column index to have the same structure as before
    _split = [item.split('*|*') for item in df.columns.to_frame().values[:, 1]]
    _level_0 = df.columns.to_frame().values[:, 0].tolist()

    # get the old feature names (units still missing)
    idx_list = [(item[0], item[1][0], item[1][1]) for item in zip(_level_0, _split)]
    df_1.columns = pd.Index(idx_list)

I deleted the * for the sake of simplicity but doing so removed the cause why my initial approach (see anky:91's answer): df.columns.to_frame().iloc[:,1].str.split('|') did not work 为了简单起见,我删除了* ,但这样做消除了我最初使用方法的原因(请参阅anky:91的回答): df.columns.to_frame().iloc[:,1].str.split('|')不工作

Another method is to access your levels with index.get_level_values and split them into three indices: 另一种方法是使用index.get_level_values访问级别并将它们分为三个索引:

idx1 = [idx.split('|')[0] for idx in df.index.get_level_values(1)]
idx2 = [idx.split('|')[1] for idx in df.index.get_level_values(1)]
df.index = [df.index.get_level_values(0), idx1, idx2]

Output 产量

Empty DataFrame
Columns: []
Index: [(A, a, a_unit), (A, b, b_unit), (A, c, c_unit), (B, a, a_unit), (B, b, b_unit), (B, c, c_unit), (C, a, a_unit), (C, b, b_unit), (C, c, c_unit), (D, a, a_unit), (D, b, b_unit), (D, c, c_unit)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM