[英]Pandas keep duplicated with highest value
I have data similar to: 我有类似的数据:
id value duplicate
a 200 yes
a 12 yes
b 42 yes
c 12 no
b 532 yes
b 21 yes
...
To track the duplicates I use df['duplicate'] = df.duplicated('id', keep=False)
However, I would like to keep the ones with the highest value
and either mark or drop the other duplicates. 为了跟踪重复,我使用df['duplicate'] = df.duplicated('id', keep=False)
但是,我想保留具有最高value
的那些,并标记或删除其他重复项。 Any suggestions? 有什么建议么?
Ah I don't know why I didn't think of this first. 啊,我不知道为什么我没想到这个。 df.sort(['id', 'value']) df['is_duplicated'] = df.duplicated('id', keep='first')
sorry! 抱歉!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.