[英]Pandas: add multiple columns to a multiindex column dataframe
This question is an attempt to generalise the solution provided for the this question:这个问题试图概括为这个问题提供的解决方案:
Pandas: add a column to a multiindex column dataframe Pandas:向多索引列添加一列 dataframe
I need to produce a column for each column index.我需要为每个列索引生成一列。
The solution provided by spencerlyon2
works when we want to add a single column: spencerlyon2
提供的解决方案适用于我们要添加单列时:
df['bar', 'three'] = [0, 1, 2]
However I would like to generalise this operation for every first level column index.但是,我想为每个第一级列索引概括此操作。
Source DF:来源 DF:
In [1]: df
Out[2]:
first bar baz
second one two one two
A -1.089798 2.053026 0.470218 1.440740
B 0.488875 0.428836 1.413451 -0.683677
C -0.243064 -0.069446 -0.911166 0.478370
Target DF below, requires that the three
column is the addition of the one
and two
columns of its respective index.下面的目标DF,要求three
列是其各自索引的one
列和two
列的相加。
In [1]: df
Out[2]:
first bar baz
second one two three one two three
A -1.089798 2.053026 0.963228 1.440740 -2.317647 -0.876907
B 0.488875 0.428836 0.917711 -0.683677 0.345873 -0.337804
C -0.243064 -0.069446 -0.312510 0.478370 0.266761 0.745131
You can use join
with two data frames with same indexes to create a bunch of columns all at once.您可以使用join
两个具有相同索引的数据框来一次创建一堆列。
First, calculate the sum using groupby
against axis=1
首先,使用groupby
对axis=1
计算总和
ndf = df.groupby(df.columns.get_level_values(0), axis=1).sum()
bar baz
A 0.963228 1.910958
B 0.917711 0.729774
C -0.312510 -0.432796
(PS: If you have more than two columns, you may do (PS:如果你有两个以上的列,你可以这样做
df.loc[:, (slice(None), ['one', 'two'])].groupby(df.columns.get_level_values(0), axis=1).sum()
to slice only columns 'one' and 'two' first, and just then groupby
)先只切片“一”和“二”列,然后是groupby
)
Then, make it match your column indexes, ie make it a MultiIndexed data frame just like your original data frame然后,使其与您的列索引匹配,即使其成为 MultiIndexed 数据框,就像您的原始数据框一样
ndf.columns = pd.MultiIndex.from_product([ndf.columns, ['three']])
bar baz
three three
A 0.963228 1.910958
B 0.917711 0.729774
C -0.312510 -0.432796
finaldf = df.join(ndf).sort_index(axis=1)
If you really care about the ordering, use reindex
如果您真的关心排序,请使用reindex
finaldf.reindex(['one', 'two', 'three'], axis=1, level=1)
first bar baz
second one two three one two three
A -1.089798 2.053026 0.963228 0.470218 1.440740 1.910958
B 0.488875 0.428836 0.917711 1.413451 -0.683677 0.729774
C -0.243064 -0.069446 -0.312510 -0.911166 0.478370 -0.432796
I started from your sample input:我从您的示例输入开始:
first bar baz
second one two one two
A -1.089798 2.053026 0.470218 1.440740
B 0.488875 0.428836 1.413451 -0.683677
C -0.243064 -0.069446 -0.911166 0.478370
To add a new column to each level 0 of the column MultiIndex, you can run something like:要将新列添加到列 MultiIndex 的每个级别 0,您可以运行以下命令:
for c1 in df.columns.get_level_values('first').unique():
# New column int index
cInd = int(df.columns.get_loc(c1).stop)
col = (c1, 'three') # New column name
newVal = df[(c1, 'one')] + df[(c1, 'two')]
df.insert(loc=cInd, column=col, value=newVal) # Insert the new column
In the above example, values in new columns are consecutive numbers, but in your case set them as you wish.在上面的示例中,新列中的值是连续的数字,但在您的情况下,可以根据需要设置它们。
The result of my code (after the column sort) is:我的代码的结果(在列排序之后)是:
first bar baz
second one two three one two three
A -1.089798 2.053026 0.963228 0.470218 1.440740 1.910958
B 0.488875 0.428836 0.917711 1.413451 -0.683677 0.729774
C -0.243064 -0.069446 -0.312510 -0.911166 0.478370 -0.432796
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.