如何在python中用总大小划分组的每个值？

Question

I have a dataframe with Yes/no answer in the column 'quality','price','time'.我在“质量”、“价格”、“时间”列中有一个带有是/否答案的数据框。

I transformed in 1 and 0 and grouped我在 1 和 0 中转换并分组

grouped = df.group.by(['country'])[['quality','price','time']].sum() to get only the 'Yes' answers and the result is: grouped = df.group.by(['country'])[['quality','price','time']].sum()只得到“是”的答案，结果是：

country国家	quality质量	price价格	time时间
FRANCE法国	5 5	4 4	3 3
GERMANY德国	3 3	2 2	6 6
UK英国	2 2	1 1	4 4

I would like to know how to divide each values in groupby for the size(N total) of each country respondents, in my case FRANCE = 9 , GERMANY = 11, UK = 12.我想知道如何根据每个国家/地区受访者的大小（总计 N）划分 groupby 中的每个值，在我的情况下为 FRANCE = 9、GERMANY = 11、UK = 12。

I know that i can select single group and make operations with it: france = country_split.loc[['FRANCE']]我知道我可以选择单个组并使用它进行操作： france = country_split.loc[['FRANCE']]

(france/9)*100

but it is possible to make operations for the entire group in one time?但是可以一次对整个组进行操作吗？

Answer 1

Use Series.value_counts for counts and divide values of columns after aggregate sum , then multiple by 100 :使用Series.value_counts进行计数并在汇总sum之后除以列的sum ，然后乘以100 ：

#if need dict for counts
#s = {'FRANCE': 9, 'GERMANY': 11, 'UK': 12}

s = df['country'].value_counts()

grouped = df.groupby(['country'])[['quality','price','time']].sum().div(s, axis=0).mul(100)
print (grouped)
           quality      price       time
country                                 
FRANCE   55.555556  44.444444  33.333333
GERMANY  27.272727  18.181818  54.545455
UK       16.666667   8.333333  33.333333

Not tested, but possible solution should be aggregate mean instead sum :未测试，但可能的解决方案应该是聚合mean而不是sum ：

grouped = df.groupby(['country'])[['quality','price','time']].mean()

如何在python中用总大小划分组的每个值？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-06-16 10:48:23

如何在python中用总大小划分组的每个值？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-06-16 10:48:23

解决方案1
1 已采纳 2021-06-16 10:48:23