[英]Pandas : Group by and sort by total size
Let say I have this result 可以说我有这个结果
group1 = df.groupby(['first_column', 'second_column'], as_index=False).size()
first_column second_column
A A1 1
A2 2
B B1 1
B2 2
B3 3
And then I want it to calculate the total size for first_column and display it into something like this 然后我希望它计算first_column的总大小并将其显示为这样
first_column second_column
A A1 1 3
A2 2
B B1 1 6
B2 2
B3 3
And based on the total size, i want it to be sort into top 10 largest total size. 并且基于总大小,我希望将其分类为前十大最大总大小。 How can i do something like this ? 我该怎么做? Also is it possible to give a name for the columns. 也可以给列起一个名字。 Like this 像这样
first_column second_column size total_size
Update 1 更新1
The dataframe should be something like this. 数据框应该是这样的。
df.head()
first_column second_column
0 A A1
1 A A2
2 A A2
3 B B1
4 B B2
5 B B2
6 B B3
7 B B3
8 B B3
Code comments should be self explanatory. 代码注释应具有自我解释性。
# Sample data.
df = pd.DataFrame({'first_column': ['A']*3 + ['B']*6, 'second_column': ['A1'] + ['A2']*2 + ['B1'] + ['B2']*2 + ['B3']*3})
# Create initial groupby, rename column to 'size' and reset index.
gb = df.groupby(['first_column', 'second_column'], as_index=False).size()
gb.name = 'size'
gb = gb.reset_index()
>>> gb
first_column second_column size
0 A A1 1
1 A A2 2
2 B B1 1
3 B B2 2
4 B B3 3
# Use transform to sum the `size` by the first column only.
gb['total_size'] = gb.groupby('first_column')['size'].transform('sum')
>>> gb
first_column second_column size total_size
0 A A1 1 3
1 A A2 2 3
2 B B1 1 6
3 B B2 2 6
4 B B3 3 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.