Each row in my DataFrame is a user vote entry for a restaurant. The data look like
id cuisine
91 american
3 american
91 american
233 cuban
233 cuban
2 cuban
where id
refers to the restaurant.
I want to get something like the following
american 91 100
3 30
12 10
cuban 233 80
2 33
mexican 22 99
8 98
21 82
where the 2nd column is the id
, and the 3rd column is the number of rows in the DataFrame for that id
. In other words, sort by the number of rows, but group by cuisine. I tried
g = df.groupby(['cuisine', 'id'])
c = g.size().sort_values(ascending=False)
But the order of the cuisines is mixed.
is that what you want?
In [2]: df
Out[2]:
id cuisine
0 91 american
1 3 american
2 91 american
3 233 cuban
4 233 cuban
5 2 cuban
In [3]: df.groupby(['cuisine', 'id']).size()
Out[3]:
cuisine id
american 3 1
91 2
cuban 2 1
233 2
dtype: int64
or as a data frame:
In [10]: df.groupby(['cuisine', 'id']).size().reset_index(name='count').sort_values(['cuisine', 'count'], ascending=[1,0])
Out[10]:
cuisine id count
1 american 91 2
0 american 3 1
3 cuban 233 2
2 cuban 2 1
use value_counts
after group_by
followed by sort_index
# ascending=[1, 0] says True for level[0], False for level[1]
df.groupby('cuisine').id.value_counts().sort_index(ascending=[1, 0])
cuisine id
american 91 2
3 1
cuban 233 2
2 1
Name: id, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.