[英]Apply function to multilevel columns
Given a pandas
dataframe: 给定一个
pandas
数据框:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'clients': pd.Series(['A', 'A', 'A', 'B', 'B']),
'x': pd.Series([1.0, 1.0, 2.0, 1.0, 2.0]),
'y': pd.Series([6.0, 7.0, 8.0, 9.0, 10.0]),
'z': pd.Series([3, 2, 1, 0, 0])
})
grpd = df.groupby(['clients']).agg({
'x': [np.sum, np.average],
'y': [np.sum, np.average],
'z': [np.sum, np.average]
})
In[55]: grpd
Out[53]:
y x z
sum average sum average sum average
clients
A 21 7.0 4 1.333333 6 2
B 19 9.5 3 1.500000 0 0
how can I create a new column applying a function to a selected sub-column? 如何创建将功能应用于所选子列的新列?
The desired result is: 理想的结果是:
y x z new_col
sum average sum average sum average
clients
A 21 7.0 4 1.333333 6 2 0.19
B 19 9.5 3 1.500000 0 0 0.15
I had something like this in mind: 我有这样的想法:
grpd['new_col'] = grpd[['x', 'y']].apply(lambda x: x[0]['sum'] / x[1]['sum'], axis=1)
You can do vectorized versions of the operation: 您可以执行以下操作的向量化版本:
grpd['new_col'] = grpd[('x', 'sum')]/grpd[('y', 'sum')]
Or, for consistency (makes the second-level index for new_col
sum
like it is for x
and y
): 或者,为了保持一致性(使
new_col
sum
的二级索引像x
和y
):
grpd[('new_col','sum')] = grpd[('x', 'sum')]/grpd[('y', 'sum')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.