简体繁体 English

如何删除重复项，但在熊猫中保留第一个实例并保留重复项的空白单元格？

[英]How to delete duplicates, but keep the first instance and a blank cell for the duplicates in Pandas?

原文 2016-09-27 15:41:46 7 1 python/ pandas/ dataframe

I have a pandas DataFrame, and I'm doing a groupby(['target']).count(). 我有一个熊猫DataFrame，并且正在做一个groupby（['target']）。count（）。 This works fine. 这很好。 However, one of the things I want, for each group, is the number of unique elements in the ID column. 但是，对于每个组，我想要做的一件事情就是ID列中唯一元素的数量。

What I'd like to do is, for the ID column, null out all but the first copy of any ID value (IDs are unique to groups, so I don't have to worry about that issue). 我想要做的是，对于ID列，将所有ID值的第一个副本（所有ID都是组唯一的，因此不必担心该问题）的所有副本都为空。 Then, the groupby().count() will give me the number of unique IDs in each group... But I'm not sure how to do that. 然后，groupby（）。count（）将为我提供每个组中唯一ID的数量...但是我不确定如何做到这一点。

1 个解决方案

The DataFrame.duplicated() method is applicable here if you want to do it the way you described. 如果要按照描述的方式进行操作，则可以在此处应用DataFrame.duplicated()方法。 It can return a Series with the first occurrence of an ID being False and the rest being True. 它可以返回一个系列，其ID的第一个出现为False，其余为True。 You can then use this as a mask to set the duplicated IDs to null. 然后，您可以将其用作掩码，以将重复的ID设置为null。

See: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html 请参阅： http : //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html

如何在熊猫数据框中保留前两个重复项？ - How to keep first two duplicates in a pandas dataframe?

熊猫：删除连续的重复项，但保留第一个和最后一个值 - Pandas: delete consecutive duplicates but keep the first and last value

如何删除重复 pandas - How to delete duplicates pandas

如何删除重复项并保持熊猫的第一价值？ - How do I drop duplicates and keep the first value on pandas?

熊猫：如何按列选择第一个或最后一个与 drop_duplicates 保持一致 - pandas: how to select first or last by column in keep with drop_duplicates

如何在 pandas 中删除重复项但保留比第一个更多 - How to drop duplicates in pandas but keep more than the first

Pandas - 与删除重复项相反，先保留 - Pandas - Opposite of drop duplicates, keep first

删除 pandas 中的重复项时保留第一次出现 - Keep first occurrence while removing duplicates in pandas

如何从 csv 中的列的单元格中删除重复项 - how to delete duplicates from a cell of a column in csv

如何删除重复项但首先保留在 pyspark dataframe 中？ - how to drop duplicates but keep first in pyspark dataframe?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在熊猫数据框中保留前两个重复项？ - How to keep first two duplicates in a pandas dataframe? 熊猫：删除连续的重复项，但保留第一个和最后一个值 - Pandas: delete consecutive duplicates but keep the first and last value 如何删除重复 pandas - How to delete duplicates pandas 如何删除重复项并保持熊猫的第一价值？ - How do I drop duplicates and keep the first value on pandas? 熊猫：如何按列选择第一个或最后一个与 drop_duplicates 保持一致 - pandas: how to select first or last by column in keep with drop_duplicates 如何在 pandas 中删除重复项但保留比第一个更多 - How to drop duplicates in pandas but keep more than the first Pandas - 与删除重复项相反，先保留 - Pandas - Opposite of drop duplicates, keep first 删除 pandas 中的重复项时保留第一次出现 - Keep first occurrence while removing duplicates in pandas 如何从 csv 中的列的单元格中删除重复项 - how to delete duplicates from a cell of a column in csv 如何删除重复项但首先保留在 pyspark dataframe 中？ - how to drop duplicates but keep first in pyspark dataframe?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM