简体   繁体   English

Pandas:如何在组内进行值计数

[英]Pandas: how to do value counts within groups

I have the following dataframe.我有以下 dataframe。 I want to group by a and b first.我想先按ab分组。 Within each group, I need to do a value count based on c and only pick the one with most counts.在每个组中,我需要根据c进行值计数,并且只选择计数最多的一个。 If there are more than one c values for one group with the most counts, just pick any one.如果计数最多的一组有多个 c 值,则选择任意一个。

a    b    c
1    1    x
1    1    y
1    1    y
1    2    y
1    2    y
1    2    z
2    1    z
2    1    z
2    1    a
2    1    a

The expected result would be预期的结果是

a    b    c
1    1    y
1    2    y
2    1    z

What is the right way to do it?正确的方法是什么? It would be even better if I can print out each group with c's value counts sorted as an intermediate step.如果我可以打印出每个组,并将 c 的值计数作为中间步骤排序,那就更好了。

group the original dataframe by ['a', 'b'] and get the .max() should work将原始 dataframe 按['a', 'b']分组并获得.max()应该可以工作

df.groupby(['a', 'b'])['c'].max()

you can also aggregate 'count' and 'max' values您还可以汇总'count''max'

df.groupby(['a', 'b'])['c'].agg({'max': max, 'count': 'count'}).reset_index()

You are looking for .value_counts() :您正在寻找.value_counts()

df.groupby(['a', 'b'])['c'].value_counts()
a  b  c
1  1  y    2
      x    1
   2  y    2
      z    1
2  1  a    2
      z    2
Name: c, dtype: int64

Try:尝试:

df=df.groupby(["a", "b", "c"])["c"].count().sort_values(ascending=False).reset_index(name="dropme").drop_duplicates(subset=["a", "b"], keep="first").drop("dropme", axis=1)

Outputs:输出:

   a  b  c
0  2  1  z
2  1  2  y
3  1  1  y

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM