How to subset a pandas dataframe on value_counts?

Question

I have the following pandas dataframe

import pandas as pd
df = pd.read_csv("filename1.csv")

df
    column1  column2   column3
0        10       A          1
1        15       A          1
2        19       B          1
3      5071       B          0
4      5891       B          0
5      3210       B          0
6        12       B          2
7        13       C          2
8        20       C          0
9         5       C          3
10        9       C          3

Now, using the function value_counts() will give me the counts of each value in a certain column, eg

df.column3.value_counts()

1   3
2   2
3   2

However, I would like to subset a pandas dataframe based on the number of values in a given column. For example, in the above dataframe df , I would like to subset on rows with 3 or more unique values (excluding 0). In this case, the resulting dataframe would be

df
    column1  column2   column3
0        10       A          1
1        15       A          1
2        19       B          1

As the rows for values 2 and 3 only had two rows, ie 2, 3 only occurred twice in column3 . What is the pandas way to do this?

Answer 1

You can use groupby.filter ; In the filter, construct a unique boolean value for each group to filter the data frame:

df.groupby("column3").filter(lambda g: (g.name != 0) and (g.column3.size >= 3))

Another option could be:

df[(df.column3 != 0) & (df.groupby("column3").column3.transform("size") >= 3)]

Answer 2

或者您可以在分组之前过滤掉零：

 df1[df1['column3'] != 0].groupby("column3").filter(lambda x: x['column3'].size >= 3 )

Answer 3

Alternative solution:

In [132]: cnt = df.column3.value_counts()

In [133]: cnt
Out[133]:
0    4
1    3
3    2
2    2
Name: column3, dtype: int64

In [134]: v = cnt[(cnt.index != 0) & (cnt >= 3)].index.values

In [135]: v
Out[135]: array([1], dtype=int64)

In [136]: df.query("column3 in @v")
Out[136]:
   column1 column2  column3
0       10       A        1
1       15       A        1
2       19       B        1

How to subset a pandas dataframe on value_counts?

Question

3 answers

solution1
3 ACCPTED 2017-03-29 20:55:42

solution2
1 2017-03-29 21:05:23

solution3
1 2017-03-29 21:15:33

How to subset a pandas dataframe on value_counts?

Question

3 answers

solution1 3 ACCPTED 2017-03-29 20:55:42

solution2 1 2017-03-29 21:05:23

solution3 1 2017-03-29 21:15:33

solution1
3 ACCPTED 2017-03-29 20:55:42

solution2
1 2017-03-29 21:05:23

solution3
1 2017-03-29 21:15:33