python groupby多列，计数和百分比

Question

I have a test dataframe:我有一个测试数据框：

data= (['a','test1', 'cat'], ['a','test1', 'cat'], ['b', 'test2', 'dog'])
pd.DataFrame(data, columns= ['col1','col2','col3'])

How would I be able to group by col1, col2 and col3 as well as having the count and percentage of each group, sorted with the highest at the top?我如何能够按 col1、col2 和 col3 进行分组，以及将每组的计数和百分比按最高值排序在顶部？

The expected output is:预期的输出是：

a test1 cat 2 66.6
b test2 dog 1 33.3

Thank you!谢谢！

Answer 1

Here is another way using groupby.ngroup and value_counts :这是使用groupby.ngroup和value_counts另一种方法：

g = df.groupby(['col1','col2','col3'],sort=False)
s = g.ngroup().value_counts(normalize=True,sort=False)
s.index = g.groups.keys()

out = g.size().to_frame('Size').assign(Percentage=s.mul(100).round(2)).reset_index()

  col1   col2 col3  Size  Percentage
0    a  test1  cat     2       66.67
1    b  test2  dog     1       33.33

Answer 2

Try this尝试这个

df_final = df.groupby(df.columns.tolist()).size().reset_index(name='counts')    
df_final['percentage'] = df_final.counts / len(df) * 100

Out[78]:
  col1   col2 col3  counts  percentage
0    a  test1  cat       2    66.6667
1    b  test2  dog       1    33.3333

Answer 3

# sample data
data= (['a','test1', 'cat'], ['a','test1', 'cat'], ['b', 'test2', 'dog'])
df = pd.DataFrame(data, columns= ['col1','col2','col3'])

# create a function that calls df and the len of the frame
def my_func(df, l=len(df)):
    data = {
        'count': df['col3'].count(),
        'percent': df['col3'].count() / l*100
    }

    return pd.Series(data)

# groupby and apply the function and sort values
df.groupby(['col1', 'col2', 'col3']).apply(my_func).sort_values('percent', ascending=False)

                 count    percent
col1 col2  col3                  
a    test1 cat     2.0  66.666667
b    test2 dog     1.0  33.333333

python groupby多列，计数和百分比

问题描述

3 个解决方案

解决方案1
4 已采纳 2020-04-02 17:16:11

解决方案2
2 2020-04-02 17:22:59

解决方案3
1 2020-04-02 17:08:04

python groupby多列，计数和百分比

问题描述

3 个解决方案

解决方案1 4 已采纳 2020-04-02 17:16:11

解决方案2 2 2020-04-02 17:22:59

解决方案3 1 2020-04-02 17:08:04

解决方案1
4 已采纳 2020-04-02 17:16:11

解决方案2
2 2020-04-02 17:22:59

解决方案3
1 2020-04-02 17:08:04