[英]Pandas: how to do value counts within groups
I have the following dataframe.我有以下 dataframe。 I want to group by
a
and b
first.我想先按
a
和b
分组。 Within each group, I need to do a value count based on c
and only pick the one with most counts.在每个组中,我需要根据
c
进行值计数,并且只选择计数最多的一个。 If there are more than one c values for one group with the most counts, just pick any one.如果计数最多的一组有多个 c 值,则选择任意一个。
a b c
1 1 x
1 1 y
1 1 y
1 2 y
1 2 y
1 2 z
2 1 z
2 1 z
2 1 a
2 1 a
The expected result would be预期的结果是
a b c
1 1 y
1 2 y
2 1 z
What is the right way to do it?正确的方法是什么? It would be even better if I can print out each group with c's value counts sorted as an intermediate step.
如果我可以打印出每个组,并将 c 的值计数作为中间步骤排序,那就更好了。
group the original dataframe by ['a', 'b']
and get the .max()
should work将原始 dataframe 按
['a', 'b']
分组并获得.max()
应该可以工作
df.groupby(['a', 'b'])['c'].max()
you can also aggregate 'count'
and 'max'
values您还可以汇总
'count'
和'max'
值
df.groupby(['a', 'b'])['c'].agg({'max': max, 'count': 'count'}).reset_index()
You are looking for .value_counts()
:您正在寻找
.value_counts()
:
df.groupby(['a', 'b'])['c'].value_counts()
a b c
1 1 y 2
x 1
2 y 2
z 1
2 1 a 2
z 2
Name: c, dtype: int64
Try:尝试:
df=df.groupby(["a", "b", "c"])["c"].count().sort_values(ascending=False).reset_index(name="dropme").drop_duplicates(subset=["a", "b"], keep="first").drop("dropme", axis=1)
Outputs:输出:
a b c
0 2 1 z
2 1 2 y
3 1 1 y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.