[英]pandas: Grouping by two columns and then sorting it by the values of a third column
I have the following line: 我有以下几行:
genre_df.groupby(['release_year', 'genres']).vote_average.mean()
This gives me the following: 这给了我以下内容:
release_year genres
1960 Action 6.950000
Adventure 7.150000
Comedy 7.900000
Drama 7.600000
Fantasy 7.300000
History 6.900000
Horror 8.000000
Romance 7.600000
Science Fiction 7.300000
Thriller 7.650000
Western 7.000000
1961 Action 7.000000
Adventure 6.800000
Animation 6.600000
Comedy 7.000000
Crime 6.600000
Drama 7.000000
Family 6.600000
History 6.700000
Music 6.600000
Romance 7.400000
War 7.000000
...
What I'd like to see is the df grouped by release year and genre, but sorted by the highest vote average first. 我想看到的是按发行年份和流派分组的df,但先按最高投票平均数排序。
AKA: 又名:
release_year genres
1960 Horror 8.000000
Comedy 7.900000
Action 6.950000
Thriller 7.650000
Drama 7.600000
Romance 7.600000
Fantasy 7.300000
Science Fiction 7.300000
Adventure 7.150000
Western 7.000000
History 6.900000
How can this be achieved? 如何做到这一点?
Solution for 0.23.0+ - first create one column DataFrame
by to_frame
and then sort_values
: 解决方案0.23.0+ -首先创建一个列
DataFrame
由to_frame
然后sort_values
:
df = df.to_frame().sort_values(['release_year','vote_average'], ascending=[True, False])
print (df)
vote_average
release_year genres
1960 Horror 8.00
Comedy 7.90
Thriller 7.65
Drama 7.60
Romance 7.60
Fantasy 7.30
Science Fiction 7.30
Adventure 7.15
Western 7.00
Action 6.95
History 6.90
1961 Romance 7.40
Action 7.00
Comedy 7.00
Drama 7.00
War 7.00
Adventure 6.80
History 6.70
Animation 6.60
Crime 6.60
Family 6.60
Music 6.60
For oldier versions of pandas is necessary reset_index
and set_index
: 对于较旧版本的熊猫,必须
reset_index
和set_index
:
df = (df.reset_index()
.sort_values(['release_year','vote_average'], ascending=[True, False])
.set_index(['release_year','genres']))
try this: 尝试这个:
genre_df = genre_df.reset_index()
genre_df.sort_values(['vote_average'],ascending=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.