简体   繁体   English

熊猫:分组并按总大小排序

[英]Pandas : Group by and sort by total size

Let say I have this result 可以说我有这个结果

group1 = df.groupby(['first_column', 'second_column'], as_index=False).size()

first_column    second_column   
A               A1              1
                A2              2
B               B1              1
                B2              2
                B3              3

And then I want it to calculate the total size for first_column and display it into something like this 然后我希望它计算first_column的总大小并将其显示为这样

first_column    second_column       
A               A1              1           3
                A2              2
B               B1              1           6
                B2              2
                B3              3       

And based on the total size, i want it to be sort into top 10 largest total size. 并且基于总大小,我希望将其分类为前十大最大总大小。 How can i do something like this ? 我该怎么做? Also is it possible to give a name for the columns. 也可以给列起一个名字。 Like this 像这样

first_column    second_column   size    total_size

Update 1 更新1

The dataframe should be something like this. 数据框应该是这样的。

df.head()

    first_column    second_column
0   A               A1
1   A               A2
2   A               A2
3   B               B1
4   B               B2
5   B               B2
6   B               B3
7   B               B3
8   B               B3

Code comments should be self explanatory. 代码注释应具有自我解释性。

# Sample data.
df = pd.DataFrame({'first_column': ['A']*3 + ['B']*6, 'second_column': ['A1'] + ['A2']*2 + ['B1'] + ['B2']*2 + ['B3']*3})

# Create initial groupby, rename column to 'size' and reset index.
gb = df.groupby(['first_column', 'second_column'], as_index=False).size()
gb.name = 'size'
gb = gb.reset_index()

>>> gb
  first_column second_column  size
0            A            A1     1
1            A            A2     2
2            B            B1     1
3            B            B2     2
4            B            B3     3

# Use transform to sum the `size` by the first column only.
gb['total_size'] = gb.groupby('first_column')['size'].transform('sum')

>>> gb
  first_column second_column  size  total_size
0            A            A1     1           3
1            A            A2     2           3
2            B            B1     1           6
3            B            B2     2           6
4            B            B3     3           6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM