Pandas：如何在组内进行值计数

Question

I have the following dataframe.我有以下 dataframe。 I want to group by a and b first.我想先按a和b分组。 Within each group, I need to do a value count based on c and only pick the one with most counts.在每个组中，我需要根据c进行值计数，并且只选择计数最多的一个。 If there are more than one c values for one group with the most counts, just pick any one.如果计数最多的一组有多个 c 值，则选择任意一个。

a    b    c
1    1    x
1    1    y
1    1    y
1    2    y
1    2    y
1    2    z
2    1    z
2    1    z
2    1    a
2    1    a

The expected result would be预期的结果是

a    b    c
1    1    y
1    2    y
2    1    z

What is the right way to do it?正确的方法是什么？ It would be even better if I can print out each group with c's value counts sorted as an intermediate step.如果我可以打印出每个组，并将 c 的值计数作为中间步骤排序，那就更好了。

Answer 1

group the original dataframe by ['a', 'b'] and get the .max() should work将原始 dataframe 按['a', 'b']分组并获得.max()应该可以工作

df.groupby(['a', 'b'])['c'].max()

you can also aggregate 'count' and 'max' values您还可以汇总'count'和'max'值

df.groupby(['a', 'b'])['c'].agg({'max': max, 'count': 'count'}).reset_index()

Answer 2

You are looking for .value_counts() :您正在寻找.value_counts() ：

df.groupby(['a', 'b'])['c'].value_counts()

a  b  c
1  1  y    2
      x    1
   2  y    2
      z    1
2  1  a    2
      z    2
Name: c, dtype: int64

Answer 3

Try:尝试：

df=df.groupby(["a", "b", "c"])["c"].count().sort_values(ascending=False).reset_index(name="dropme").drop_duplicates(subset=["a", "b"], keep="first").drop("dropme", axis=1)

Outputs:输出：

Pandas：如何在组内进行值计数

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-04-10 15:51:48

解决方案2
1 2020-04-10 16:15:24

解决方案3
0 2020-04-10 18:26:46

Pandas：如何在组内进行值计数

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-04-10 15:51:48

解决方案2 1 2020-04-10 16:15:24

解决方案3 0 2020-04-10 18:26:46

解决方案1
1 已采纳 2020-04-10 15:51:48

解决方案2
1 2020-04-10 16:15:24

解决方案3
0 2020-04-10 18:26:46