简体   繁体   中英

adding two columns in multiindex dataframe pandas

I am using pandas 1.14.

I have dataframe that looks like this:

                          col1      col2  ....
    A   B   C  D  E    

   11   1   1  1  1        2          3
                  3        3          4
               30 3        10         2
                ...        ...
   22   3   4  5  6        3          1

df.index outputs

MultiIndex([('11', '1', '1', '1', '1'),
            ('11', '1', '1', '1', '3'),
            ('11', '1', '1', '30', '3'),
            ...
            ('22', '3', '4', '5', '6')],
           names=["A","B","C", "D", "E"], length=10000)

df.columns outputs

Index(["col1", "col2", ...], dtype="object")

what I want to do it add both columns and divide by 2. in single index dataframe I would usually do df["new"] = (df["col1"] + df["col2"])/2

how can I do this with multiindex dataframe?

My desired dataframe should look like this

                          col1      col2  new
    A   B   C  D  E    

   11   1   1  1  1        2          3    2.5
                  3        3          4    3.5
               30 3        10         2    6
                ...        ...
   22   3   4  5  6        3          1    2

Thanks in advance!

Your solution should work for MultiIndexes as well

In [14]: df = pd.DataFrame([[2,3],[3,4],[10,2],[3,1]], columns=['col1', 'col2'], index=index)                                                                                                              

In [15]: df                                                                                                                                                                                                
Out[15]: 
             col1  col2
A  B C D  E            
11 1 1 1  1     2     3
          3     3     4
       30 3    10     2
22 3 4 5  6     3     1

In [16]: df['new'] = (df['col1'] + df['col2'])/2                                                                                                                                                           

In [17]: df                                                                                                                                                                                                
Out[17]: 
             col1  col2  new
A  B C D  E                 
11 1 1 1  1     2     3  2.5
          3     3     4  3.5
       30 3    10     2  6.0
22 3 4 5  6     3     1  2.0

I did an experiment and your approche should work.

df = pd.DataFrame({'a':[1,2,3,4], 'b':[2,3,4,5]}, index=[['1', '1', '2', '2'], ['1','2','1','2']])
df
>>>

     a  b
1 1  1  2
  2  2  3
2 1  3  4
  2  4  5

Your approche.

df['new'] = (df['a'] + df['b']) / 2

df
>>>
     a  b  new
1 1  1  2  1.5
  2  2  3  2.5
2 1  3  4  3.5
  2  4  5  4.5
```

no special treatment, standard techniques. My standard is to always use assign()

df = pd.DataFrame({"A":[11],"B":[1],"C":[1],"D":[1],"E":[1],"col1":[2],"col2":[3]})
df = df.set_index(["A","B","C","D","E"])
df = df.assign(new=lambda dfa: dfa.sum(axis=1)/2)

print(df.to_string())

output

            col1  col2  new
A  B C D E                 
11 1 1 1 1     2     3  2.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM