[英]How to use pandas groupby to calculate percentage of total in each column
I have a data frame that contains 4 columns: id, color, flag_1 and flag_2:我有一个包含 4 列的数据框:id、color、flag_1 和 flag_2:
df = pd.DataFrame({'id': range(0,5),
'color': ['red', 'red', 'blue', 'blue', 'blue'],
'flag_1':[1, 0, 0, 0, 0],
'flag_2':[1, 1, 1, 1, 0]})
Different from this question: Pandas percentage of total with groupby , i want to group by the column color and get the percentage of total of both , flag_1 and flag_2.从这个问题的不同: 与GROUPBY共有大熊猫百分比,我想通过组列的颜色,并获得总两者flag_1和flag_2的百分比。
The result should look like this data frame:结果应如下所示:
color flag_1 flag_2
red 0.5 1
blue 0 0.67
I can't seem to figure out how to adapt the code from the cited question that aggregates just one column, to my needs.我似乎无法弄清楚如何根据我的需要调整来自仅聚合一列的引用问题的代码。
Try crosstab
:尝试
crosstab
:
m = df.drop("id", axis=1).melt("color")
pd.crosstab(m.color, m.variable, m.value, aggfunc="mean").rename_axis(None)
variable flag_1 flag_2
blue 0.0 0.666667
red 0.5 1.000000
Sticking to groupby :坚持 groupby :
df.groupby("color", sort=False).agg(flag1=("flag_1", "mean"), flag2=("flag_2", "mean"))
flag1 flag2
color
red 0.5 1.000000
blue 0.0 0.666667
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.