简体   繁体   English

如何删除重复项,但在熊猫中保留第一个实例并保留重复项的空白单元格?

[英]How to delete duplicates, but keep the first instance and a blank cell for the duplicates in Pandas?

I have a pandas DataFrame, and I'm doing a groupby(['target']).count(). 我有一个熊猫DataFrame,并且正在做一个groupby(['target'])。count()。 This works fine. 这很好。 However, one of the things I want, for each group, is the number of unique elements in the ID column. 但是,对于每个组,我想要做的一件事情就是ID列中唯一元素的数量。

What I'd like to do is, for the ID column, null out all but the first copy of any ID value (IDs are unique to groups, so I don't have to worry about that issue). 我想要做的是,对于ID列,将所有ID值的第一个副本(所有ID都是组唯一的,因此不必担心该问题)的所有副本都为空。 Then, the groupby().count() will give me the number of unique IDs in each group... But I'm not sure how to do that. 然后,groupby()。count()将为我提供每个组中唯一ID的数量...但是我不确定如何做到这一点。

The DataFrame.duplicated() method is applicable here if you want to do it the way you described. 如果要按照描述的方式进行操作,则可以在此处应用DataFrame.duplicated()方法。 It can return a Series with the first occurrence of an ID being False and the rest being True. 它可以返回一个系列,其ID的第一个出现为False,其余为True。 You can then use this as a mask to set the duplicated IDs to null. 然后,您可以将其用作掩码,以将重复的ID设置为null。

See: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html 请参阅: http : //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM