简体   繁体   English

如何使用Pandas Python中的分组列计算百分比?

[英]How to calculate a percentage using grouped columns in Pandas python?

Pandas newbie, hitting a simple problem that I can't figure out. 熊猫新手,遇到一个我不知道的简单问题。

I have a data set of baby names in the US that looks like this: 我在美国有一个婴儿名字的数据集,看起来像这样:

原始数据

I am trying to write a program where I can feed in a list of names and get back the % likelihood that the name is for a male or a female (the year is irrelevant for my purposes right now) 我正在尝试编写一个程序,可以在其中输入名称列表,并获取该名称是男性还是女性的百分比可能性(当前年份与我的目的无关)

I got as far as writing the groupby and then adding the male and female name counts together. 我写了groupby,然后将男性和女性的名字计数加在一起。

分组数据

Now all I need is to calc the percentages based on this data. 现在,我所需要做的就是根据此数据计算百分比。 I think it is some kind of transform (right?) but I can't seem to write anything that works. 我认为这是某种transform (对吗?),但是我似乎无法编写任何有效的方法。 I know just how I would do it in SQL, but I am really trying to figure out Pandas. 我知道我将如何在SQL中执行此操作,但我确实在尝试弄清楚Pandas。 Some pointers would be greatly appreciated! 一些指针将不胜感激!

Thanks! 谢谢!

If I understood correctly what you're looking for, I would first fill the missing values with zeros, ie n.fillna(0) . 如果我正确理解了您要查找的内容,那么我会先用零填充缺失的值,即n.fillna(0) Then calculate the percentages and assign the results to a new column. 然后计算百分比,并将结果分配到新列。 For female percentage: 女性比例:

n['%F'] = n[('Count', 'F')] / n['sum'] * 100

甚至在执行总和之前,您都需要这样做:

n.apply(lambda x: x / x.sum(), axis=1)

It looks like Multiindex in columns: 在各列中看起来像Multiindex

print n.columns
MultiIndex(levels=[[u'Count', u'sum'], [u'', u'F', u'M']],
           labels=[[0, 0, 1], [1, 2, 0]],
           names=[None, u'Gender'])

So first select columns F and M by using-slicers . 因此,首先通过使用切片器选择列FM Then fillna by 0 and divide by column sum : 然后将fillna除以0并除以列sum

idx = pd.IndexSlice
F = n.loc[:, idx['Count','F']] 
M = n.loc[:, idx['Count','M']] 
sum = n.loc[:, idx['sum','']]

n['%F'] = F.fillna(0)/sum * 100
n['%M'] = M.fillna(0)/sum * 100
print n

               Count                     sum          %F          %M
Gender             F           M                                    
Name                                                                
Aaban            NaN   10.285710   10.285710    0.000000  100.000000
Aabfla      7.000000         NaN    7.000000  100.000000    0.000000
Aabid            NaN    5.000000    5.000000    0.000000  100.000000
Aabrielle   5.000000         NaN    5.000000  100.000000    0.000000
Aadarn           NaN    8.521739    8.521739    0.000000  100.000000
Aadan            NaN   12.000000   12.000000    0.000000  100.000000
Aadar            NaN   11.285710   11.285710    0.000000  100.000000
Aaden       5.000000  279.002857  284.002857    1.760546   98.239454
Aade             NaN    5.000000    5.000000    0.000000  100.000000
Aadhav           NaN   12.750000   12.750000    0.000000  100.000000
Aadhavan         NaN    6.333333    6.333333    0.000000  100.000000
Aadhi            NaN    6.000000    6.000000    0.000000  100.000000
Aadhira     0.888857         NaN    9.000007    9.876181    0.000000
Aadhve     79.875000         NaN   79.875000  100.000000    0.000000
Aadhven          NaN    5.000000    5.000000    0.000000  100.000000
Aadi        5.333333   55.583333   60.910007    8.756087   91.254846
Aadian           NaN    5.000000    5.000000    0.000000  100.000000
Aadil            NaN   12.913003   12.913003    0.000000  100.000000
Aadin            NaN   12.000000   12.000000    0.000000  100.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM