[英]Pandas groupby value_count filter by frequency
I would like to filter out the frequencies that are less than n, in my case n is 2我想过滤掉小于 n 的频率,在我的情况下 n 是 2
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'bar',],'B' : ['yes', 'no', 'yes', 'no', 'no', 'yes','yes', 'no', 'no', 'no']})
df.groupby('A')['B'].value_counts()
A B
bar no 4
yes 1
foo yes 3
no 2
Name: B, dtype: int64
Ideally I would like the results in a dataframe showing the below(frequency of 1 is not excluded)理想情况下,我希望数据框中的结果显示以下内容(不排除 1 的频率)
A B freq
bar no 4
foo yes 3
foo no 2
I have tried我试过了
df.groupby('A')['B'].filter(lambda x: len(x) > 1)
but this fails as apparently groupby returns a serie但这失败了,因为显然 groupby 返回了一个系列
You can just store the .value_counts()
method output and then just filter it:您可以只存储
.value_counts()
方法输出,然后对其进行过滤:
>>> counts = df.groupby('A')['B'].value_counts()
>>> counts[counts >= 2]
A B
bar no 4
foo yes 3
no 2
Name: B, dtype: int64
If you want to get your desired output, you can call .reset_index()
method and rename the new column:如果你想得到你想要的输出,你可以调用
.reset_index()
方法并重命名新列:
>>> counts[counts >= 2].reset_index(name='count')
A B count
0 bar no 4
1 foo yes 3
2 foo no 2
This can be down with one line with .loc
这可以用一行
.loc
>>> df.groupby('A')['B'].value_counts().loc[lambda x: x > 1].reset_index(name='count')
A B count
0 bar no 4
1 foo yes 3
2 foo no 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.