简体   繁体   English

如何在python中用总大小划分组的每个值?

[英]How to divide each values of the group with total size in python?

I have a dataframe with Yes/no answer in the column 'quality','price','time'.我在“质量”、“价格”、“时间”列中有一个带有是/否答案的数据框。

I transformed in 1 and 0 and grouped我在 1 和 0 中转换并分组

grouped = df.group.by(['country'])[['quality','price','time']].sum() to get only the 'Yes' answers and the result is: grouped = df.group.by(['country'])[['quality','price','time']].sum()只得到“是”的答案,结果是:

country国家 quality质量 price价格 time时间
FRANCE法国 5 5 4 4 3 3
GERMANY德国 3 3 2 2 6 6
UK英国 2 2 1 1 4 4

I would like to know how to divide each values in groupby for the size(N total) of each country respondents, in my case FRANCE = 9 , GERMANY = 11, UK = 12.我想知道如何根据每个国家/地区受访者的大小(总计 N)划分 groupby 中的每个值,在我的情况下为 FRANCE = 9、GERMANY = 11、UK = 12。

I know that i can select single group and make operations with it: france = country_split.loc[['FRANCE']]我知道我可以选择单个组并使用它进行操作: france = country_split.loc[['FRANCE']]

(france/9)*100

but it is possible to make operations for the entire group in one time?但是可以一次对整个组进行操作吗?

Use Series.value_counts for counts and divide values of columns after aggregate sum , then multiple by 100 :使用Series.value_counts进行计数并在汇总sum之后除以列的sum ,然后乘以100

#if need dict for counts
#s = {'FRANCE': 9, 'GERMANY': 11, 'UK': 12}

s = df['country'].value_counts()

grouped = df.groupby(['country'])[['quality','price','time']].sum().div(s, axis=0).mul(100)
print (grouped)
           quality      price       time
country                                 
FRANCE   55.555556  44.444444  33.333333
GERMANY  27.272727  18.181818  54.545455
UK       16.666667   8.333333  33.333333

Not tested, but possible solution should be aggregate mean instead sum :未测试,但可能的解决方案应该是聚合mean而不是sum

grouped = df.groupby(['country'])[['quality','price','time']].mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM