drops rows in pandas dataframe if an 'id' occurs less than 2 times

Question

I want to try to drop the rows based on the occurance of an id.

my dataframe looks like this:

df.head()
>>
index   id  tweet_len

161660  4001    5
116708  8571    5
213433  1813    5
213449  1813    5
213450  1813    5
213455  1813    5
29295   8190    5
213457  1813    5
29293   8190    5
213461  1531    5

I want to drop all the rows if the id has appeared exactly once.

df.groupby('id').agg('count')['tweet_len']<2

gives me

id
2        False
3        False
4        False
6        False
7        False
         ...  
9996     False
9997     False
9998     False
9999     False
10000    False
Name: tweet_len, Length: 9252, dtype: bool

but I want the indices so that I can drop rows from those indices. How can I?

Answer 1

You can transform and aggregate with the size , and use the result to index the dataframe:

df[df.groupby('id').index.transform('size').gt(1)]

   index    id   tweet_len
2  213433  1813          5
3  213449  1813          5
4  213450  1813          5
5  213455  1813          5
6   29295  8190          5
7  213457  1813          5
8   29293  8190          5

Answer 2

You can just use duplicate :

df[df.duplicated('id',keep=False)]

Output:

    index    id  tweet_len
2  213433  1813          5
3  213449  1813          5
4  213450  1813          5
5  213455  1813          5
6   29295  8190          5
7  213457  1813          5
8   29293  8190          5

drops rows in pandas dataframe if an 'id' occurs less than 2 times

Question

2 answers

solution1
2 ACCPTED 2020-03-26 12:43:14

solution2
1 2020-03-26 12:47:44

drops rows in pandas dataframe if an 'id' occurs less than 2 times

Question

2 answers

solution1 2 ACCPTED 2020-03-26 12:43:14

solution2 1 2020-03-26 12:47:44

solution1
2 ACCPTED 2020-03-26 12:43:14

solution2
1 2020-03-26 12:47:44