[英]How to divide each values of the group with total size in python?
I have a dataframe with Yes/no answer in the column 'quality','price','time'.我在“质量”、“价格”、“时间”列中有一个带有是/否答案的数据框。
I transformed in 1 and 0 and grouped我在 1 和 0 中转换并分组
grouped = df.group.by(['country'])[['quality','price','time']].sum()
to get only the 'Yes' answers and the result is: grouped = df.group.by(['country'])[['quality','price','time']].sum()
只得到“是”的答案,结果是:
country![]() |
quality![]() |
price![]() |
time![]() |
---|---|---|---|
FRANCE![]() |
5 ![]() |
4 ![]() |
3 ![]() |
GERMANY![]() |
3 ![]() |
2 ![]() |
6 ![]() |
UK![]() |
2 ![]() |
1 ![]() |
4 ![]() |
I would like to know how to divide each values in groupby for the size(N total) of each country respondents, in my case FRANCE = 9 , GERMANY = 11, UK = 12.我想知道如何根据每个国家/地区受访者的大小(总计 N)划分 groupby 中的每个值,在我的情况下为 FRANCE = 9、GERMANY = 11、UK = 12。
I know that i can select single group and make operations with it: france = country_split.loc[['FRANCE']]
我知道我可以选择单个组并使用它进行操作:
france = country_split.loc[['FRANCE']]
(france/9)*100
but it is possible to make operations for the entire group in one time?但是可以一次对整个组进行操作吗?
Use Series.value_counts
for counts and divide values of columns after aggregate sum
, then multiple by 100
:使用
Series.value_counts
进行计数并在汇总sum
之后除以列的sum
,然后乘以100
:
#if need dict for counts
#s = {'FRANCE': 9, 'GERMANY': 11, 'UK': 12}
s = df['country'].value_counts()
grouped = df.groupby(['country'])[['quality','price','time']].sum().div(s, axis=0).mul(100)
print (grouped)
quality price time
country
FRANCE 55.555556 44.444444 33.333333
GERMANY 27.272727 18.181818 54.545455
UK 16.666667 8.333333 33.333333
Not tested, but possible solution should be aggregate mean
instead sum
:未测试,但可能的解决方案应该是聚合
mean
而不是sum
:
grouped = df.groupby(['country'])[['quality','price','time']].mean()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.