简体   繁体   中英

How to calculate a percentage using grouped columns in Pandas python?

Pandas newbie, hitting a simple problem that I can't figure out.

I have a data set of baby names in the US that looks like this:

原始数据

I am trying to write a program where I can feed in a list of names and get back the % likelihood that the name is for a male or a female (the year is irrelevant for my purposes right now)

I got as far as writing the groupby and then adding the male and female name counts together.

分组数据

Now all I need is to calc the percentages based on this data. I think it is some kind of transform (right?) but I can't seem to write anything that works. I know just how I would do it in SQL, but I am really trying to figure out Pandas. Some pointers would be greatly appreciated!

Thanks!

If I understood correctly what you're looking for, I would first fill the missing values with zeros, ie n.fillna(0) . Then calculate the percentages and assign the results to a new column. For female percentage:

n['%F'] = n[('Count', 'F')] / n['sum'] * 100

甚至在执行总和之前,您都需要这样做:

n.apply(lambda x: x / x.sum(), axis=1)

It looks like Multiindex in columns:

print n.columns
MultiIndex(levels=[[u'Count', u'sum'], [u'', u'F', u'M']],
           labels=[[0, 0, 1], [1, 2, 0]],
           names=[None, u'Gender'])

So first select columns F and M by using-slicers . Then fillna by 0 and divide by column sum :

idx = pd.IndexSlice
F = n.loc[:, idx['Count','F']] 
M = n.loc[:, idx['Count','M']] 
sum = n.loc[:, idx['sum','']]

n['%F'] = F.fillna(0)/sum * 100
n['%M'] = M.fillna(0)/sum * 100
print n

               Count                     sum          %F          %M
Gender             F           M                                    
Name                                                                
Aaban            NaN   10.285710   10.285710    0.000000  100.000000
Aabfla      7.000000         NaN    7.000000  100.000000    0.000000
Aabid            NaN    5.000000    5.000000    0.000000  100.000000
Aabrielle   5.000000         NaN    5.000000  100.000000    0.000000
Aadarn           NaN    8.521739    8.521739    0.000000  100.000000
Aadan            NaN   12.000000   12.000000    0.000000  100.000000
Aadar            NaN   11.285710   11.285710    0.000000  100.000000
Aaden       5.000000  279.002857  284.002857    1.760546   98.239454
Aade             NaN    5.000000    5.000000    0.000000  100.000000
Aadhav           NaN   12.750000   12.750000    0.000000  100.000000
Aadhavan         NaN    6.333333    6.333333    0.000000  100.000000
Aadhi            NaN    6.000000    6.000000    0.000000  100.000000
Aadhira     0.888857         NaN    9.000007    9.876181    0.000000
Aadhve     79.875000         NaN   79.875000  100.000000    0.000000
Aadhven          NaN    5.000000    5.000000    0.000000  100.000000
Aadi        5.333333   55.583333   60.910007    8.756087   91.254846
Aadian           NaN    5.000000    5.000000    0.000000  100.000000
Aadil            NaN   12.913003   12.913003    0.000000  100.000000
Aadin            NaN   12.000000   12.000000    0.000000  100.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM