简体   繁体   中英

drops rows in pandas dataframe if an 'id' occurs less than 2 times

I want to try to drop the rows based on the occurance of an id.

my dataframe looks like this:

df.head()
>>
index   id  tweet_len

161660  4001    5
116708  8571    5
213433  1813    5
213449  1813    5
213450  1813    5
213455  1813    5
29295   8190    5
213457  1813    5
29293   8190    5
213461  1531    5

I want to drop all the rows if the id has appeared exactly once.

df.groupby('id').agg('count')['tweet_len']<2

gives me

id
2        False
3        False
4        False
6        False
7        False
         ...  
9996     False
9997     False
9998     False
9999     False
10000    False
Name: tweet_len, Length: 9252, dtype: bool

but I want the indices so that I can drop rows from those indices. How can I?

You can transform and aggregate with the size , and use the result to index the dataframe:

df[df.groupby('id').index.transform('size').gt(1)]

   index    id   tweet_len
2  213433  1813          5
3  213449  1813          5
4  213450  1813          5
5  213455  1813          5
6   29295  8190          5
7  213457  1813          5
8   29293  8190          5

You can just use duplicate :

df[df.duplicated('id',keep=False)]

Output:

    index    id  tweet_len
2  213433  1813          5
3  213449  1813          5
4  213450  1813          5
5  213455  1813          5
6   29295  8190          5
7  213457  1813          5
8   29293  8190          5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM