[英]python groupby multiple columns, count and percentage
I have a test dataframe:我有一个测试数据框:
data= (['a','test1', 'cat'], ['a','test1', 'cat'], ['b', 'test2', 'dog'])
pd.DataFrame(data, columns= ['col1','col2','col3'])
How would I be able to group by col1, col2 and col3 as well as having the count and percentage of each group, sorted with the highest at the top?我如何能够按 col1、col2 和 col3 进行分组,以及将每组的计数和百分比按最高值排序在顶部?
The expected output is:预期的输出是:
a test1 cat 2 66.6
b test2 dog 1 33.3
Thank you!谢谢!
Here is another way using groupby.ngroup
and value_counts
:这是使用groupby.ngroup
和value_counts
另一种方法:
g = df.groupby(['col1','col2','col3'],sort=False)
s = g.ngroup().value_counts(normalize=True,sort=False)
s.index = g.groups.keys()
out = g.size().to_frame('Size').assign(Percentage=s.mul(100).round(2)).reset_index()
col1 col2 col3 Size Percentage
0 a test1 cat 2 66.67
1 b test2 dog 1 33.33
Try this尝试这个
df_final = df.groupby(df.columns.tolist()).size().reset_index(name='counts')
df_final['percentage'] = df_final.counts / len(df) * 100
Out[78]:
col1 col2 col3 counts percentage
0 a test1 cat 2 66.6667
1 b test2 dog 1 33.3333
# sample data
data= (['a','test1', 'cat'], ['a','test1', 'cat'], ['b', 'test2', 'dog'])
df = pd.DataFrame(data, columns= ['col1','col2','col3'])
# create a function that calls df and the len of the frame
def my_func(df, l=len(df)):
data = {
'count': df['col3'].count(),
'percent': df['col3'].count() / l*100
}
return pd.Series(data)
# groupby and apply the function and sort values
df.groupby(['col1', 'col2', 'col3']).apply(my_func).sort_values('percent', ascending=False)
count percent
col1 col2 col3
a test1 cat 2.0 66.666667
b test2 dog 1.0 33.333333
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.