简体   繁体   English

将功能套用至多层栏

[英]Apply function to multilevel columns

Given a pandas dataframe: 给定一个pandas数据框:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
    'x': pd.Series([1.0, 1.0, 2.0, 1.0, 2.0]),
    'y': pd.Series([6.0, 7.0, 8.0, 9.0, 10.0]),
    'z': pd.Series([3, 2, 1, 0, 0])
})

grpd = df.groupby(['clients']).agg({
    'x': [np.sum, np.average],
    'y': [np.sum, np.average],
    'z': [np.sum, np.average]
})


In[55]: grpd
Out[53]: 
          y           x             z        
        sum average sum   average sum average
clients                                      
A        21     7.0   4  1.333333   6       2
B        19     9.5   3  1.500000   0       0

how can I create a new column applying a function to a selected sub-column? 如何创建将功能应用于所选子列的新列?

The desired result is: 理想的结果是:

          y           x             z         new_col
        sum average sum   average sum average 
clients                                      
A        21     7.0   4  1.333333   6       2  0.19
B        19     9.5   3  1.500000   0       0  0.15

I had something like this in mind: 我有这样的想法:

grpd['new_col'] = grpd[['x', 'y']].apply(lambda x: x[0]['sum'] / x[1]['sum'], axis=1)

You can do vectorized versions of the operation: 您可以执行以下操作的向量化版本:

grpd['new_col'] = grpd[('x', 'sum')]/grpd[('y', 'sum')]

Or, for consistency (makes the second-level index for new_col sum like it is for x and y ): 或者,为了保持一致性(使new_col sum的二级索引像xy ):

grpd[('new_col','sum')] = grpd[('x', 'sum')]/grpd[('y', 'sum')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM