Remove all groups with more than N observations

Question

If a value occurs more than two times in a column I want to drop every row that it occurs in.

The input df would look like:

The output df would look like:

Name   Num
  Y     3
  Y     4

I know it is possible to remove duplicates, but that only works if I want to remove the first or last duplicate that is found, not the nth duplicate.

df = df.drop_duplicates(subset = ['Name'], drop='third')

This code is completely wrong but it helps explain what I was trying to do.

Answer 1

Use GroupBy.cumcount for counter and filter all values less like 2 :

df1 = df[df.groupby('Name').cumcount() < 3]
print (df1)
  Name  Num
0    X    1
1    X    2
2    Y    3
3    Y    4

Detail :

print (df.groupby('Name').cumcount())
0    0
1    1
2    0
3    1
4    2
dtype: int64

EDIT

Filter by GroupBy.transform and GroupBy.size :

df1 = df[df.groupby('Name')['Num'].transform('size') < 3]
print (df1)
  Name  Num
2    Y    3
3    Y    4

Answer 2

Using head

df.groupby('Name').head(2)
Out[375]: 
  Name  Num
0    X    1
1    X    2
2    Y    3
3    Y    4

s=df.groupby('Name').size()<=2
df.loc[df.Name.isin(s[s].index)]
Out[380]: 
  Name  Num
2    Y    3
3    Y    4

Remove all groups with more than N observations

Question

2 answers

solution1
4 2019-03-05 14:32:18

solution2
4 ACCPTED 2019-03-05 14:34:52

Remove all groups with more than N observations

Question

2 answers

solution1 4 2019-03-05 14:32:18

solution2 4 ACCPTED 2019-03-05 14:34:52

solution1
4 2019-03-05 14:32:18

solution2
4 ACCPTED 2019-03-05 14:34:52