Pandas newbie, hitting a simple problem that I can't figure out.
I have a data set of baby names in the US that looks like this:
I am trying to write a program where I can feed in a list of names and get back the % likelihood that the name is for a male or a female (the year is irrelevant for my purposes right now)
I got as far as writing the groupby and then adding the male and female name counts together.
Now all I need is to calc the percentages based on this data. I think it is some kind of transform
(right?) but I can't seem to write anything that works. I know just how I would do it in SQL, but I am really trying to figure out Pandas. Some pointers would be greatly appreciated!
Thanks!
If I understood correctly what you're looking for, I would first fill the missing values with zeros, ie n.fillna(0)
. Then calculate the percentages and assign the results to a new column. For female percentage:
n['%F'] = n[('Count', 'F')] / n['sum'] * 100
甚至在执行总和之前,您都需要这样做:
n.apply(lambda x: x / x.sum(), axis=1)
It looks like Multiindex
in columns:
print n.columns
MultiIndex(levels=[[u'Count', u'sum'], [u'', u'F', u'M']],
labels=[[0, 0, 1], [1, 2, 0]],
names=[None, u'Gender'])
So first select columns F
and M
by using-slicers . Then fillna
by 0
and divide by column sum
:
idx = pd.IndexSlice
F = n.loc[:, idx['Count','F']]
M = n.loc[:, idx['Count','M']]
sum = n.loc[:, idx['sum','']]
n['%F'] = F.fillna(0)/sum * 100
n['%M'] = M.fillna(0)/sum * 100
print n
Count sum %F %M
Gender F M
Name
Aaban NaN 10.285710 10.285710 0.000000 100.000000
Aabfla 7.000000 NaN 7.000000 100.000000 0.000000
Aabid NaN 5.000000 5.000000 0.000000 100.000000
Aabrielle 5.000000 NaN 5.000000 100.000000 0.000000
Aadarn NaN 8.521739 8.521739 0.000000 100.000000
Aadan NaN 12.000000 12.000000 0.000000 100.000000
Aadar NaN 11.285710 11.285710 0.000000 100.000000
Aaden 5.000000 279.002857 284.002857 1.760546 98.239454
Aade NaN 5.000000 5.000000 0.000000 100.000000
Aadhav NaN 12.750000 12.750000 0.000000 100.000000
Aadhavan NaN 6.333333 6.333333 0.000000 100.000000
Aadhi NaN 6.000000 6.000000 0.000000 100.000000
Aadhira 0.888857 NaN 9.000007 9.876181 0.000000
Aadhve 79.875000 NaN 79.875000 100.000000 0.000000
Aadhven NaN 5.000000 5.000000 0.000000 100.000000
Aadi 5.333333 55.583333 60.910007 8.756087 91.254846
Aadian NaN 5.000000 5.000000 0.000000 100.000000
Aadil NaN 12.913003 12.913003 0.000000 100.000000
Aadin NaN 12.000000 12.000000 0.000000 100.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.