[英]python pandas sorting by group
Each row in my DataFrame is a user vote entry for a restaurant. 我的DataFrame中的每一行都是餐厅的用户投票项。 The data look like 数据看起来像
id cuisine
91 american
3 american
91 american
233 cuban
233 cuban
2 cuban
where id
refers to the restaurant. 其中id
是指餐厅。
I want to get something like the following 我想得到类似以下内容
american 91 100
3 30
12 10
cuban 233 80
2 33
mexican 22 99
8 98
21 82
where the 2nd column is the id
, and the 3rd column is the number of rows in the DataFrame for that id
. 其中第二列是id
,第三列是该id
在DataFrame中的行数。 In other words, sort by the number of rows, but group by cuisine. 换句话说,按行数排序,但按美食分组。 I tried 我试过了
g = df.groupby(['cuisine', 'id'])
c = g.size().sort_values(ascending=False)
But the order of the cuisines is mixed. 但是美食的顺序是混杂的。
is that what you want? 那是你要的吗?
In [2]: df
Out[2]:
id cuisine
0 91 american
1 3 american
2 91 american
3 233 cuban
4 233 cuban
5 2 cuban
In [3]: df.groupby(['cuisine', 'id']).size()
Out[3]:
cuisine id
american 3 1
91 2
cuban 2 1
233 2
dtype: int64
or as a data frame: 或作为数据框:
In [10]: df.groupby(['cuisine', 'id']).size().reset_index(name='count').sort_values(['cuisine', 'count'], ascending=[1,0])
Out[10]:
cuisine id count
1 american 91 2
0 american 3 1
3 cuban 233 2
2 cuban 2 1
use value_counts
after group_by
followed by sort_index
在group_by
之后使用value_counts
,后跟sort_index
# ascending=[1, 0] says True for level[0], False for level[1]
df.groupby('cuisine').id.value_counts().sort_index(ascending=[1, 0])
cuisine id
american 91 2
3 1
cuban 233 2
2 1
Name: id, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.