如何在对两列进行分组并在 Pandas Dataframe 中获取值计数后获得最高值行？

Question

I'm grouping by two columns with the following line of code:我使用以下代码行按两列分组：

df.groupby('topic')['category'].value_counts()

I get the following output:我得到以下输出：

topic                 category     

topic1            Entertainment    1303
                  Science           462
                  Sports            351
                  Economy           270
                  Business          161
                  Technology         92
                  Education          40
                  Politics           18
                  Environment         5

topic2            Politics          134
                  Economy           133
                  Entertainment     110
                  Sports             69
                  Business           68
                  Science            45
                  Technology         22
                  Education           7
                  Environment         2

topic3            Entertainment    1370
                  Sports            533
                  Economy           485
                  Science           335
                  Business          207
                  Politics          180
                  Education         108
                  Technology         97
                  Environment        12

I want to get the topmost row for every topic (which is the most frequent category), something like this:我想获得每个主题（这是最常见的类别）的最上面一行，如下所示：

topic                 category     

topic1            Entertainment    1303
topic2            Politics          134
topic3            Entertainment    1370

Answer 1

In pandas, value_counts will sort the values in descending order so everything you need to do is take the top value from each group and return that.在 Pandas 中， value_counts将按降序对值进行排序，因此您需要做的就是从每个组中取出最高值并返回它。 This can easily be done by applying a function:这可以通过应用函数轻松完成：

def top_value_count(x):
    return x.value_counts().head(1)

df.groupby('topic')['category'].apply(top_value_count)

Change the 1 to another number to return more values per topic.将1更改为另一个数字以返回每个主题的更多值。

如何在对两列进行分组并在 Pandas Dataframe 中获取值计数后获得最高值行？

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-05-21 11:30:53

如何在对两列进行分组并在 Pandas Dataframe 中获取值计数后获得最高值行？

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-05-21 11:30:53

解决方案1
3 已采纳 2018-05-21 11:30:53