简体   繁体   English

尝试使用 pandas drop_duplicates 在某个字段上删除的重复项制作 dataframe

[英]Trying to make a dataframe with the dropped duplicates from pandas drop_duplicates on a certain field

I have a dataframe which is supposed to have a unique field.我有一个 dataframe 应该有一个独特的字段。 In the data I am given the field is not unique and so I have been using drop_duplicates to get rid of those.在我获得的数据中,该字段不是唯一的,因此我一直在使用 drop_duplicates 来摆脱这些。 However, I would like to see what rows I am dropping for QC.但是,我想看看我为 QC 删除了哪些行。 I've been reading threads on this but I've only seen ones that look at entire duplicate rows (not just one field that is duplicated), or they compare dataframes that don't have duplicates within themselves.我一直在阅读这方面的主题,但我只看到了那些查看整个重复行(不仅仅是一个重复的字段)的主题,或者它们比较了内部没有重复的数据帧。 How can I get a dataframe of the rows that are removed in my code below?如何获得在下面的代码中删除的行的 dataframe? Thank you谢谢

   df= df.drop_duplicates(subset='_nefin_tree_obsID', keep=False)

refer to documentation duplicated参考文档重复

this should help这应该有帮助

df.duplicated(subset='_nefin_tree_obsID' )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM