Let's say I have the following dataFrame and I want to drop the rows containing 10, and 100, ie the elements that have appeared only once in col1.
I can do the following:
a = df.groupby('col1').size()
b = list(a[a == 1].index)
and then have a for loop and drop the rows one by one:
d_ind = df[df['col1']==b[0]].index
df.drop(d_ind, axis=0, inplace=True)
Is there any faster, more efficient way?
You can use the duplicated
method on col1
, which can detect whether an element has duplicates with keep=False
parameter and returns a boolean Series which you can use to subset/filter/drop rows:
df[df.col1.duplicated(keep=False)]
# col1 col2 months
#0 1 3 6
#1 1 4 6
#4 4 20 6
#5 4 11 7
#6 4 12 7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.