熊猫：如何通过拆分从一个多索引级别向一个多索引添加级别？

Question

How can I create a new level by splitting the second level at | 如何通过将第二个级别拆分为|来创建新级别| ? ？

The initial index: 初始索引：

MultiIndex(levels=[['A', 'B', 'C', 'D'], ['a|a_unit', 'b|b_unit', 'c|c_unit']],
       codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

Desired output: 所需的输出：

What I tried: 我试过的

# plan was to create a new column and use set_index
df.columns.to_frame().iloc[:,1].str.split('|')

EDIT: The reason why my approach did not work was the following: 编辑：我的方法不起作用的原因如下：

Initially, I had the values in level 1 of the index separated by ' | 最初，我在索引的第1级中的值由' |分隔。 ' to make this example simpler, I deleted the * . 为了简化此示例，我删除了* 。 Without the start everything worked well, but with the start, I got an re error: 如果没有启动一切运作良好，但一开始，我得到了一个re错误：

re.error: nothing to repeat at position 0

Having proper testcases is really tricky sometimes. 有时拥有适当的测试用例确实很棘手。

Answer 1

You can try with: 您可以尝试：

s=df.columns.to_frame().iloc[:,1].str.split('|')
final=(pd.DataFrame(data=df.values,columns=df.columns.get_level_values(0))
                   .T.set_index([s.str[0],s.str[1]],append=True).T)

Or: 要么：

final=(pd.DataFrame(columns=
 pd.MultiIndex.from_arrays([df.columns.get_level_values(0),s.str[0],s.str[1]])))

Answer 2

The answer by anky_91 is quite compact. anky_91的答案非常紧凑。 Here is another solution which also works with this index: 这是另一个与此索引配合使用的解决方案：

MultiIndex(levels=[['A', 'B', 'C', 'D'], ['a*|*a_unit', 'b*|*b_unit', 'c*|*c_unit']],
       codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

    #  clean up the column index to have the same structure as before
    _split = [item.split('*|*') for item in df.columns.to_frame().values[:, 1]]
    _level_0 = df.columns.to_frame().values[:, 0].tolist()

    # get the old feature names (units still missing)
    idx_list = [(item[0], item[1][0], item[1][1]) for item in zip(_level_0, _split)]
    df_1.columns = pd.Index(idx_list)

I deleted the * for the sake of simplicity but doing so removed the cause why my initial approach (see anky:91's answer): df.columns.to_frame().iloc[:,1].str.split('|') did not work 为了简单起见，我删除了* ，但这样做消除了我最初使用方法的原因（请参阅anky：91的回答）： df.columns.to_frame().iloc[:,1].str.split('|')不工作

Answer 3

Another method is to access your levels with index.get_level_values and split them into three indices: 另一种方法是使用index.get_level_values访问级别并将它们分为三个索引：

idx1 = [idx.split('|')[0] for idx in df.index.get_level_values(1)]
idx2 = [idx.split('|')[1] for idx in df.index.get_level_values(1)]
df.index = [df.index.get_level_values(0), idx1, idx2]

Output 产量

Empty DataFrame
Columns: []
Index: [(A, a, a_unit), (A, b, b_unit), (A, c, c_unit), (B, a, a_unit), (B, b, b_unit), (B, c, c_unit), (C, a, a_unit), (C, b, b_unit), (C, c, c_unit), (D, a, a_unit), (D, b, b_unit), (D, c, c_unit)]

熊猫：如何通过拆分从一个多索引级别向一个多索引添加级别？

问题描述

3 个解决方案

解决方案1
1 已采纳 2019-08-11 11:04:09

解决方案2
1 2019-08-11 11:08:24

解决方案3
1 2019-08-11 11:11:46

熊猫：如何通过拆分从一个多索引级别向一个多索引添加级别？

问题描述

3 个解决方案

解决方案1 1 已采纳 2019-08-11 11:04:09

解决方案2 1 2019-08-11 11:08:24

解决方案3 1 2019-08-11 11:11:46

解决方案1
1 已采纳 2019-08-11 11:04:09

解决方案2
1 2019-08-11 11:08:24

解决方案3
1 2019-08-11 11:11:46