[英]How to calculate a percentage using grouped columns in Pandas python?
Pandas newbie, hitting a simple problem that I can't figure out. 熊猫新手,遇到一个我不知道的简单问题。
I have a data set of baby names in the US that looks like this: 我在美国有一个婴儿名字的数据集,看起来像这样:
I am trying to write a program where I can feed in a list of names and get back the % likelihood that the name is for a male or a female (the year is irrelevant for my purposes right now) 我正在尝试编写一个程序,可以在其中输入名称列表,并获取该名称是男性还是女性的百分比可能性(当前年份与我的目的无关)
I got as far as writing the groupby and then adding the male and female name counts together. 我写了groupby,然后将男性和女性的名字计数加在一起。
Now all I need is to calc the percentages based on this data. 现在,我所需要做的就是根据此数据计算百分比。 I think it is some kind of transform
(right?) but I can't seem to write anything that works. 我认为这是某种transform
(对吗?),但是我似乎无法编写任何有效的方法。 I know just how I would do it in SQL, but I am really trying to figure out Pandas. 我知道我将如何在SQL中执行此操作,但我确实在尝试弄清楚Pandas。 Some pointers would be greatly appreciated! 一些指针将不胜感激!
Thanks! 谢谢!
If I understood correctly what you're looking for, I would first fill the missing values with zeros, ie n.fillna(0)
. 如果我正确理解了您要查找的内容,那么我会先用零填充缺失的值,即n.fillna(0)
。 Then calculate the percentages and assign the results to a new column. 然后计算百分比,并将结果分配到新列。 For female percentage: 女性比例:
n['%F'] = n[('Count', 'F')] / n['sum'] * 100
甚至在执行总和之前,您都需要这样做:
n.apply(lambda x: x / x.sum(), axis=1)
It looks like Multiindex
in columns: 在各列中看起来像Multiindex
:
print n.columns
MultiIndex(levels=[[u'Count', u'sum'], [u'', u'F', u'M']],
labels=[[0, 0, 1], [1, 2, 0]],
names=[None, u'Gender'])
So first select columns F
and M
by using-slicers . 因此,首先通过使用切片器选择列F
和M
Then fillna
by 0
and divide by column sum
: 然后将fillna
除以0
并除以列sum
:
idx = pd.IndexSlice
F = n.loc[:, idx['Count','F']]
M = n.loc[:, idx['Count','M']]
sum = n.loc[:, idx['sum','']]
n['%F'] = F.fillna(0)/sum * 100
n['%M'] = M.fillna(0)/sum * 100
print n
Count sum %F %M
Gender F M
Name
Aaban NaN 10.285710 10.285710 0.000000 100.000000
Aabfla 7.000000 NaN 7.000000 100.000000 0.000000
Aabid NaN 5.000000 5.000000 0.000000 100.000000
Aabrielle 5.000000 NaN 5.000000 100.000000 0.000000
Aadarn NaN 8.521739 8.521739 0.000000 100.000000
Aadan NaN 12.000000 12.000000 0.000000 100.000000
Aadar NaN 11.285710 11.285710 0.000000 100.000000
Aaden 5.000000 279.002857 284.002857 1.760546 98.239454
Aade NaN 5.000000 5.000000 0.000000 100.000000
Aadhav NaN 12.750000 12.750000 0.000000 100.000000
Aadhavan NaN 6.333333 6.333333 0.000000 100.000000
Aadhi NaN 6.000000 6.000000 0.000000 100.000000
Aadhira 0.888857 NaN 9.000007 9.876181 0.000000
Aadhve 79.875000 NaN 79.875000 100.000000 0.000000
Aadhven NaN 5.000000 5.000000 0.000000 100.000000
Aadi 5.333333 55.583333 60.910007 8.756087 91.254846
Aadian NaN 5.000000 5.000000 0.000000 100.000000
Aadil NaN 12.913003 12.913003 0.000000 100.000000
Aadin NaN 12.000000 12.000000 0.000000 100.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.