[英]Pandas: How to add a level to a multiindex from one multiindex level by splitting?
How can I create a new level by splitting the second level at |
如何通过将第二个级别拆分为|
来创建新级别|
? ?
The initial index: 初始索引:
MultiIndex(levels=[['A', 'B', 'C', 'D'], ['a|a_unit', 'b|b_unit', 'c|c_unit']],
codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
Desired output: 所需的输出:
What I tried: 我试过的
# plan was to create a new column and use set_index
df.columns.to_frame().iloc[:,1].str.split('|')
EDIT: The reason why my approach did not work was the following: 编辑:我的方法不起作用的原因如下:
Initially, I had the values in level 1 of the index separated by ' | 最初,我在索引的第1级中的值由' |分隔。 ' to make this example simpler, I deleted the *
. 为了简化此示例,我删除了*
。 Without the start everything worked well, but with the start, I got an re
error: 如果没有启动一切运作良好,但一开始,我得到了一个re
错误:
re.error: nothing to repeat at position 0
Having proper testcases is really tricky sometimes. 有时拥有适当的测试用例确实很棘手。
You can try with: 您可以尝试:
s=df.columns.to_frame().iloc[:,1].str.split('|')
final=(pd.DataFrame(data=df.values,columns=df.columns.get_level_values(0))
.T.set_index([s.str[0],s.str[1]],append=True).T)
Or: 要么:
final=(pd.DataFrame(columns=
pd.MultiIndex.from_arrays([df.columns.get_level_values(0),s.str[0],s.str[1]])))
The answer by anky_91 is quite compact. anky_91的答案非常紧凑。 Here is another solution which also works with this index: 这是另一个与此索引配合使用的解决方案:
MultiIndex(levels=[['A', 'B', 'C', 'D'], ['a*|*a_unit', 'b*|*b_unit', 'c*|*c_unit']],
codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
# clean up the column index to have the same structure as before
_split = [item.split('*|*') for item in df.columns.to_frame().values[:, 1]]
_level_0 = df.columns.to_frame().values[:, 0].tolist()
# get the old feature names (units still missing)
idx_list = [(item[0], item[1][0], item[1][1]) for item in zip(_level_0, _split)]
df_1.columns = pd.Index(idx_list)
I deleted the *
for the sake of simplicity but doing so removed the cause why my initial approach (see anky:91's answer): df.columns.to_frame().iloc[:,1].str.split('|')
did not work 为了简单起见,我删除了*
,但这样做消除了我最初使用方法的原因(请参阅anky:91的回答): df.columns.to_frame().iloc[:,1].str.split('|')
不工作
Another method is to access your levels with index.get_level_values
and split them into three indices: 另一种方法是使用index.get_level_values
访问级别并将它们分为三个索引:
idx1 = [idx.split('|')[0] for idx in df.index.get_level_values(1)]
idx2 = [idx.split('|')[1] for idx in df.index.get_level_values(1)]
df.index = [df.index.get_level_values(0), idx1, idx2]
Output 产量
Empty DataFrame
Columns: []
Index: [(A, a, a_unit), (A, b, b_unit), (A, c, c_unit), (B, a, a_unit), (B, b, b_unit), (B, c, c_unit), (C, a, a_unit), (C, b, b_unit), (C, c, c_unit), (D, a, a_unit), (D, b, b_unit), (D, c, c_unit)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.