[英]Neat inter-row calculations within pandas DataFrame with MultiIndex
I have the classic ucb admission dataset as a pandas DataFrame with multiIndex: 我有经典的ucb接纳数据集作为具有multiIndex的pandas DataFrame:
value
Dept Gender Admit
A Male Admitted 512
Rejected 313
Female Admitted 89
Rejected 19
etc. for other departments ('A' through 'F') 其他部门(“ A”至“ F”)
and I want to create a table of the ratio of students accepted to rejected, grouped by Dept and Gender 我想创建一个表格,按部门和性别将接受和拒绝的学生比例分组
My current approaches have been 我目前的方法是
ucbA.groupby(level=['Dept', 'Gender']).apply(lambda x: x.xs('Admitted', level=2).iloc[0] / x.xs('Rejected', level=2).iloc[0]).unstack().value
which is horrible 这太可怕了
and 和
admitted = ucbA.unstack('Admit')
DataFrame({'Proportion Accepted': admitted.value.Admitted / admitted.value.Rejected}).unstack(1)
which ok I guess, but I feel it should be possible as a one-liner without unstacking. 好的,我想是这样,但我认为它应该是单线而不堆叠的。
Is there a really neat way of doing something like this? 做这样的事情真的有一种整洁的方法吗? I'm imagining a one-liner staying within the context of the multi-index. 我想象一种单线停留在多索引的上下文中。
Edit: The full frame: 编辑:全帧:
DataFrame({'Admit': {0: 'Admitted', 1: 'Rejected', 2: 'Admitted', 3: 'Rejected', 4: 'Admitted', 5: 'Rejected', 6: 'Admitted', 7: 'Rejected', 8: 'Admitted', 9: 'Rejected', 10: 'Admitted', 11: 'Rejected', 12: 'Admitted', 13: 'Rejected', 14: 'Admitted', 15: 'Rejected', 16: 'Admitted', 17: 'Rejected', 18: 'Admitted', 19: 'Rejected', 20: 'Admitted', 21: 'Rejected', 22: 'Admitted', 23: 'Rejected'}, 'Dept': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'B', 5: 'B', 6: 'B', 7: 'B', 8: 'C', 9: 'C', 10: 'C', 11: 'C', 12: 'D', 13: 'D', 14: 'D', 15: 'D', 16: 'E', 17: 'E', 18: 'E', 19: 'E', 20: 'F', 21: 'F', 22: 'F', 23: 'F'}, 'Gender': {0: 'Male', 1: 'Male', 2: 'Female', 3: 'Female', 4: 'Male', 5: 'Male', 6: 'Female', 7: 'Female', 8: 'Male', 9: 'Male', 10: 'Female', 11: 'Female', 12: 'Male', 13: 'Male', 14: 'Female', 15: 'Female', 16: 'Male', 17: 'Male', 18: 'Female', 19: 'Female', 20: 'Male', 21: 'Male', 22: 'Female', 23: 'Female'}, 'value': {0: 512, 1: 313, 2: 89, 3: 19, 4: 353, 5: 207, 6: 17, 7: 8, 8: 120, 9: 205, 10: 202, 11: 391, 12: 138, 13: 279, 14: 131, 15: 244, 16: 53, 17: 138, 18: 94, 19: 299, 20: 22, 21: 351, 22: 24, 23: 317}}).set_index(['Dept', 'Gender', 'Admit']).astype(float).astype(int)
Alternatively if you have rpy: 或者,如果您有rpy:
import pandas.rpy.common as com
ucbA = com.load_data('UCBAdmissions').set_index(['Dept', 'Gender', 'Admit']).astype(float).astype(int)
Here you go: 干得好:
df = pd.DataFrame({'Dept':['A','A','A','A'],
'Gender':['Male', 'Male', 'Female', 'Female'],
'Admit':['Admitted', 'Rejected', 'Admitted', 'Rejected'],
'value':[512,313,89,19]})
df = df.set_index(['Dept', 'Gender', 'Admit'])
# Proportions accepted and rejected:
df / df.groupby(level=['Dept','Gender']).transform(sum)
# value
#Dept Gender Admit
#A Female Admitted 0.824074
# Rejected 0.175926
# Male Admitted 0.620606
# Rejected 0.379394
# If you really want admitted as fraction of rejected:
df2 = df.swaplevel(1,2).swaplevel(0,1)
df2.ix['Admitted'] / df2.ix['Rejected']
# value
#Dept Gender
#A Male 1.635783
# Female 4.684211
Here's a way 这是一种方法
In [55]: grouper = ['Dept','Gender']
In [56]: x = df.reset_index()
In [57]: (x[x.Admit=='Admitted'].groupby(grouper).sum() /
x[x.Admit=='Rejected'].groupby(grouper).sum()
).unstack()
Out[57]:
value
Gender Female Male
Dept
A 4.684211 1.635783
B 2.125000 1.705314
C 0.516624 0.585366
D 0.536885 0.494624
E 0.314381 0.384058
F 0.075710 0.062678
[6 rows x 2 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.