If a value occurs more than two times in a column I want to drop every row that it occurs in.
The input df would look like:
Name Num
X 1
X 2
Y 3
Y 4
X 5
The output df would look like:
Name Num
Y 3
Y 4
I know it is possible to remove duplicates, but that only works if I want to remove the first or last duplicate that is found, not the nth duplicate.
df = df.drop_duplicates(subset = ['Name'], drop='third')
This code is completely wrong but it helps explain what I was trying to do.
Use GroupBy.cumcount
for counter and filter all values less like 2
:
df1 = df[df.groupby('Name').cumcount() < 3]
print (df1)
Name Num
0 X 1
1 X 2
2 Y 3
3 Y 4
Detail :
print (df.groupby('Name').cumcount())
0 0
1 1
2 0
3 1
4 2
dtype: int64
EDIT
Filter by GroupBy.transform
and GroupBy.size
:
df1 = df[df.groupby('Name')['Num'].transform('size') < 3]
print (df1)
Name Num
2 Y 3
3 Y 4
Using head
df.groupby('Name').head(2)
Out[375]:
Name Num
0 X 1
1 X 2
2 Y 3
3 Y 4
s=df.groupby('Name').size()<=2
df.loc[df.Name.isin(s[s].index)]
Out[380]:
Name Num
2 Y 3
3 Y 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.