[英]New level in MultiIndex DataFrame based on existing column level values
Let's say I have a DataFrame like this:假设我有一个这样的 DataFrame:
df = pd.DataFrame(data = [[1,2,3,4,5,6], [3,4,5,6,7,8]],
columns = pd.MultiIndex.from_product([('A1', 'B1', 'A2'), (10,20)], names=['level_0','level_1']))
Here's how it looks like: DataFrame image它是这样的: DataFrame 图片
I want to add a new level in the columns which contains 1
where level_0
value contains "1"
and and 2
where level_0 value contains "2"
.我想在包含1
的列中添加一个新级别,其中level_0
值包含"1"
和2
,其中 level_0 值包含"2"
。 So, basically:所以,基本上:
level_0 == "A1"
--> new_level
= 1
其中level_0 == "A1"
--> new_level
= 1
level_0 == "B1"
--> new_level
= 1
其中level_0 == "B1"
--> new_level
= 1
level_0 == "A2"
--> new_level
= 2
其中level_0 == "A2"
--> new_level
= 2
Any suggestions on how to do it?关于如何做的任何建议?
You could extract the values with a regex ( (\d+)$
= last digits of the value) and rework the MultiIndex with MultiIndex.from_arrays
:您可以使用正则表达式( (\d+)$
= 值的最后几位)提取值,并使用 MultiIndex.from_arrays 重新处理MultiIndex.from_arrays
:
values = df.columns.get_level_values('level_0').str.extract('(\d+)$', expand=False)
# ['1', '1', '1', '1', '2', '2']
df.columns = pd.MultiIndex.from_arrays([*zip(*df.columns.to_list()), values],
names=[*df.columns.names, 'level_2']
)
NB.注意。 this generalizes to any XXX00 value这推广到任何 XXX00 值
output: output:
level_0 A1 B1 A2
level_1 10 20 10 20 10 20
level_2 1 1 1 1 2 2
0 1 2 3 4 5 6
1 3 4 5 6 7 8
Use lsit comprehension for extract number from first level values and create new MultiIndex by MultiIndex.from_tuples
:使用 lsit 理解从第一级值中提取数字并通过 MultiIndex.from_tuples 创建新的MultiIndex.from_tuples
:
import re
df.columns = pd.MultiIndex.from_tuples([(re.findall(r'(\d+)$', x[0])[0], *x)
for x in df.columns.tolist()],
names=('new_level',*df.columns.names))
print (df)
new_level 1 2
level_0 A1 B1 A2
level_1 10 20 10 20 10 20
0 1 2 3 4 5 6
1 3 4 5 6 7 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.