简体   繁体   中英

How to delete duplicates, but keep the first instance and a blank cell for the duplicates in Pandas?

I have a pandas DataFrame, and I'm doing a groupby(['target']).count(). This works fine. However, one of the things I want, for each group, is the number of unique elements in the ID column.

What I'd like to do is, for the ID column, null out all but the first copy of any ID value (IDs are unique to groups, so I don't have to worry about that issue). Then, the groupby().count() will give me the number of unique IDs in each group... But I'm not sure how to do that.

The DataFrame.duplicated() method is applicable here if you want to do it the way you described. It can return a Series with the first occurrence of an ID being False and the rest being True. You can then use this as a mask to set the duplicated IDs to null.

See: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM