Pandas groupby value_count 按频率过滤

Question

I would like to filter out the frequencies that are less than n, in my case n is 2我想过滤掉小于 n 的频率，在我的情况下 n 是 2

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'bar',],'B' : ['yes', 'no', 'yes', 'no', 'no', 'yes','yes', 'no', 'no', 'no']})
df.groupby('A')['B'].value_counts()

A    B  
bar  no     4
     yes    1
foo  yes    3
     no     2
Name: B, dtype: int64

Ideally I would like the results in a dataframe showing the below(frequency of 1 is not excluded)理想情况下，我希望数据框中的结果显示以下内容（不排除 1 的频率）

A    B      freq
bar  no     4
foo  yes    3
foo  no     2

I have tried我试过了

df.groupby('A')['B'].filter(lambda x: len(x) > 1)

but this fails as apparently groupby returns a serie但这失败了，因为显然 groupby 返回了一个系列

Answer 1

You can just store the .value_counts() method output and then just filter it:您可以只存储.value_counts()方法输出，然后对其进行过滤：

>>> counts = df.groupby('A')['B'].value_counts()
>>> counts[counts >= 2]
A    B  
bar  no     4
foo  yes    3
     no     2
Name: B, dtype: int64

If you want to get your desired output, you can call .reset_index() method and rename the new column:如果你想得到你想要的输出，你可以调用.reset_index()方法并重命名新列：

>>> counts[counts >= 2].reset_index(name='count') 
     A    B  count
0  bar   no      4
1  foo  yes      3
2  foo   no      2

Answer 2

This can be down with one line with .loc这可以用一行.loc

>>> df.groupby('A')['B'].value_counts().loc[lambda x: x > 1].reset_index(name='count')
     A    B  count
0  bar   no      4
1  foo  yes      3
2  foo   no      2

Pandas groupby value_count 按频率过滤

问题描述

2 个解决方案

解决方案1
4 2018-05-01 13:14:53

解决方案2
0 已采纳 2018-05-01 13:37:02

Pandas groupby value_count 按频率过滤

问题描述

2 个解决方案

解决方案1 4 2018-05-01 13:14:53

解决方案2 0 已采纳 2018-05-01 13:37:02

解决方案1
4 2018-05-01 13:14:53

解决方案2
0 已采纳 2018-05-01 13:37:02