[英]Percentage from Total size after GROUP BY python
code_module final_result
AAA Distinction 44
Fail 91
Pass 487
Withdrawn 126
THIS IS AN OUTCOME OF PYTHON CODE这是 Python 代码的结果
studentInfo.groupby(['code_module','final_result']).agg({'code_module':[np.size]})
the math is AAA.pass/AAA.total数学是 AAA.pass/AAA.total
the total is the sum of all the numbers above.总数是上述所有数字的总和。
I believe you need SeriesGroupBy.value_counts
with parameter normalize
:我相信你需要
SeriesGroupBy.value_counts
和参数normalize
:
s1 = studentInfo.groupby('code_module')['final_result'].value_counts(normalize=True)
print (s1)
code_module final_result
AAA Pass 0.651070
Withdrawn 0.168449
Fail 0.121658
Distinction 0.058824
Name: final_result, dtype: float64
Or divide your simplify solution with DataFrameGroupBy.size
by sum
per first level of MultiIndex
或者将您的使用简化的解决方案
DataFrameGroupBy.size
的sum
%的第一级MultiIndex
s = studentInfo.groupby(['code_module','final_result']).size()
s2 = s.div(s.sum(level=0), level=0)
print (s2)
code_module final_result
AAA Distinction 0.058824
Fail 0.121658
Pass 0.651070
Withdrawn 0.168449
dtype: float64
Difference between solutions is value_counts
return output Series
in descending order so that the first element is the most frequently-occurring element, size
not.解决方案之间的区别是
value_counts
以降序返回输出Series
,以便第一个元素是最常出现的元素,而size
不是。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.