Drop rows of a pandas dataFrame with unique elements in a given column. (by unique I mean repeated once)

Question

Let's say I have the following dataFrame and I want to drop the rows containing 10, and 100, ie the elements that have appeared only once in col1.

I can do the following:

a = df.groupby('col1').size()
b = list(a[a == 1].index)

and then have a for loop and drop the rows one by one:

d_ind = df[df['col1']==b[0]].index
df.drop(d_ind, axis=0, inplace=True)

Is there any faster, more efficient way?

Answer 1

You can use the duplicated method on col1 , which can detect whether an element has duplicates with keep=False parameter and returns a boolean Series which you can use to subset/filter/drop rows:

df[df.col1.duplicated(keep=False)]

#   col1  col2  months
#0     1     3       6
#1     1     4       6
#4     4    20       6
#5     4    11       7
#6     4    12       7

Drop rows of a pandas dataFrame with unique elements in a given column. (by unique I mean repeated once)

Question

1 answers

solution1
2 ACCPTED 2018-04-15 20:31:16

Drop rows of a pandas dataFrame with unique elements in a given column. (by unique I mean repeated once)

Question

1 answers

solution1 2 ACCPTED 2018-04-15 20:31:16

solution1
2 ACCPTED 2018-04-15 20:31:16