简体   繁体   中英

How to add new column group after using pivot pandas?

I'm trying to create a new column group consisting of 3 sub-columns after using pivot on a dataframe, but the result is only one column.

Let's say I have the following dataframe that I pivot:

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
                           'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': [1, 2, 3, 4, 5, 6]})
df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])

Now I want an extra column group that is the sum of the two value columns baz and zoo .

My output:

df.loc[:, "baz+zoo"] = df.loc[:,'baz'] + df.loc[:,'baz']

我的输出

The desired output:

在此处输入图像描述

I know that performing the sum and then concatenating will do the trick, but I was hoping for a neater solution.

I think if many rows or mainly many columns is better/faster create new DataFrame and add first level of MultiIndex by MultiIndex.from_product and add to original by DataFrame.join :

df1 = df.loc[:,'baz'] + df.loc[:,'zoo']
df1.columns = pd.MultiIndex.from_product([['baz+zoo'], df1.columns])
print (df1)
   baz+zoo        
          A   B   C
foo                
one       2   4   6
two       8  10  12

df = df.join(df1)
print (df)
    baz       zoo       baz+zoo        
bar   A  B  C   A  B  C       A   B   C
foo                                    
one   1  2  3   1  2  3       2   4   6
two   4  5  6   4  5  6       8  10  12

Another solution is loop by second levels and select MultiIndex by tuples, but if large DataFrame performance should be worse, the best test with real data:

for x in df.columns.levels[1]:
    df[('baz+zoo', x)] = df[('baz', x)] + df[('zoo', x)]
print (df)
    baz       zoo       baz+zoo        
bar   A  B  C   A  B  C       A   B   C
foo                                    
one   1  2  3   1  2  3       2   4   6
two   4  5  6   4  5  6       8  10  12

I was able to do it this way too. I'm not sure I understand the theory, but...

df['baz+zoo'] = df['baz']+df['zoo']
df.pivot(index='foo', columns='bar', values=['baz','zoo','baz+zoo'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM