[英]How to get the highest value row after grouping two columns and getting value counts in Pandas Dataframe?
I'm grouping by two columns with the following line of code:我使用以下代码行按两列分组:
df.groupby('topic')['category'].value_counts()
I get the following output:我得到以下输出:
topic category
topic1 Entertainment 1303
Science 462
Sports 351
Economy 270
Business 161
Technology 92
Education 40
Politics 18
Environment 5
topic2 Politics 134
Economy 133
Entertainment 110
Sports 69
Business 68
Science 45
Technology 22
Education 7
Environment 2
topic3 Entertainment 1370
Sports 533
Economy 485
Science 335
Business 207
Politics 180
Education 108
Technology 97
Environment 12
I want to get the topmost row for every topic (which is the most frequent category), something like this:我想获得每个主题(这是最常见的类别)的最上面一行,如下所示:
topic category
topic1 Entertainment 1303
topic2 Politics 134
topic3 Entertainment 1370
In pandas, value_counts
will sort the values in descending order so everything you need to do is take the top value from each group and return that.在 Pandas 中,
value_counts
将按降序对值进行排序,因此您需要做的就是从每个组中取出最高值并返回它。 This can easily be done by applying a function:这可以通过应用函数轻松完成:
def top_value_count(x):
return x.value_counts().head(1)
df.groupby('topic')['category'].apply(top_value_count)
Change the 1
to another number to return more values per topic.将
1
更改为另一个数字以返回每个主题的更多值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.