大熊猫保持重复的最高价值

Question

I have data similar to: 我有类似的数据：

id value duplicate
a   200  yes
a   12   yes
b   42   yes
c   12   no
b   532  yes
b   21   yes
...

To track the duplicates I use df['duplicate'] = df.duplicated('id', keep=False) However, I would like to keep the ones with the highest value and either mark or drop the other duplicates. 为了跟踪重复，我使用df['duplicate'] = df.duplicated('id', keep=False)但是，我想保留具有最高value的那些，并标记或删除其他重复项。 Any suggestions? 有什么建议么？

Answer 1

Ah I don't know why I didn't think of this first. 啊，我不知道为什么我没想到这个。 df.sort(['id', 'value']) df['is_duplicated'] = df.duplicated('id', keep='first')

sorry! 抱歉！

大熊猫保持重复的最高价值

问题描述

1 个解决方案

解决方案1
8 2015-10-29 20:06:47

大熊猫保持重复的最高价值

问题描述

1 个解决方案

解决方案1 8 2015-10-29 20:06:47

解决方案1
8 2015-10-29 20:06:47