I want to try to drop the rows based on the occurance of an id.
my dataframe looks like this:
df.head()
>>
index id tweet_len
161660 4001 5
116708 8571 5
213433 1813 5
213449 1813 5
213450 1813 5
213455 1813 5
29295 8190 5
213457 1813 5
29293 8190 5
213461 1531 5
I want to drop all the rows if the id
has appeared exactly once.
df.groupby('id').agg('count')['tweet_len']<2
gives me
id
2 False
3 False
4 False
6 False
7 False
...
9996 False
9997 False
9998 False
9999 False
10000 False
Name: tweet_len, Length: 9252, dtype: bool
but I want the indices so that I can drop rows from those indices. How can I?
You can transform
and aggregate with the size
, and use the result to index the dataframe:
df[df.groupby('id').index.transform('size').gt(1)]
index id tweet_len
2 213433 1813 5
3 213449 1813 5
4 213450 1813 5
5 213455 1813 5
6 29295 8190 5
7 213457 1813 5
8 29293 8190 5
You can just use duplicate
:
df[df.duplicated('id',keep=False)]
Output:
index id tweet_len
2 213433 1813 5
3 213449 1813 5
4 213450 1813 5
5 213455 1813 5
6 29295 8190 5
7 213457 1813 5
8 29293 8190 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.