[英]How to find the ratio in a pandas series for a groupby function
Here's one way using pandas.pivot_table
and vectorised Pandas calculations.这是使用pandas.pivot_table
和矢量化 Pandas 计算的一种方法。 Note this method removes the need to perform a separate groupby
.请注意,此方法不需要执行单独的groupby
。
df = pd.DataFrame([['A', 'F'], ['A', 'F'], ['A', 'M'], ['B', 'M'], ['B', 'M'], ['B', 'F'],
['C', 'M'], ['C', 'M'], ['D', 'F']], columns=['Occupation', 'Gender'])
# pivot input dataframe
res = df.pivot_table(index='Occupation', columns='Gender', aggfunc='size', fill_value=0)
# calculate ratios
sums = res[['F', 'M']].sum(axis=1)
res['FemaleRatio'] = res['F'] / sums
res['MaleRatio'] = res['M'] / sums
print(res)
Gender F M FemaleRatio MaleRatio
Occupation
A 2 1 0.666667 0.333333
B 1 2 0.333333 0.666667
C 0 2 0.000000 1.000000
D 1 0 1.000000 0.000000
Maybe quite late to the party but here's what I believe is the exact answer:也许参加聚会已经很晚了,但我认为这是确切的答案:
# create pivot
male_ratio = users.pivot_table(index='occupation', columns='gender', aggfunc='size', fill_value=0)
# calculate male ratio
sums = male_ratio[['F', 'M']].sum(axis=1)
male_ratio['MaleRatio'] = round(100 * male_ratio['M'] / sums , 1)
# result
male_ratio['MaleRatio']
occupation
administrator 54.4
artist 53.6
doctor 100.0
educator 72.6
engineer 97.0
entertainment 88.9
executive 90.6
healthcare 31.2
homemaker 14.3
lawyer 83.3
librarian 43.1
marketing 61.5
none 55.6
other 65.7
programmer 90.9
retired 92.9
salesman 75.0
scientist 90.3
student 69.4
technician 96.3
writer 57.8
Name: MaleRatio, dtype: float64
x=users.groupby(['occupation','gender'])['gender'].count()
y=users.groupby(['occupation'])['gender'].count()
r=((x/y)*100).round(2)
print(r)
#ratio rule "x" is a count of gender(male/female), "y" is the total count of gender
occupation gender
administrator F 45.57
M 54.43
artist F 46.43
M 53.57
doctor M 100.00
educator F 27.37
M 72.63
engineer F 2.99
M 97.01
entertainment F 11.11
M 88.89
executive F 9.38
M 90.62
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.