[英]Drop specific rows in pandas from a numpy array
I have a dataframe a thousands of rows long that looks like this: 我有一个数千行的数据框,看起来像这样:
ID Email Address
1 ... ...
2 ... ...
3 ... ...
4 ... ...
1 ... ...
2 ... ...
5 ... ...
5 ... ...
6 ... ...
what I want to do is drop duplicates of ID so there is only one ID per person. 我想做的是删除ID的重复项,因此每人只有一个ID。 I can't use drop_duplicates() because most people don't have ID's and this drops them too (not good!) 我不能使用drop_duplicates(),因为大多数人没有ID,这也会删除它们(不好!)
Is there a way to remove specific rows and only keep one instance of the IDs. 有没有一种方法可以删除特定的行,而只保留一个ID实例。
I have a dataframe of all the duplicate ID I want to remove if that helps. 如果有帮助,我有一个要删除的所有重复ID的数据框。 eg for the example I gave above: 例如,对于我上面给出的示例:
ID Email Address
1 ... ...
2 ... ...
5 ... ...
Maybe there's a way to turn this to a series/array of IDs and remove from the df that way? 也许有一种方法可以将其转换为ID系列/数组并以这种方式从df中删除?
I believe you need chain 2 conditions - duplicated
with keep=False
for all dupes with no parameter for first dupes: 我相信您需要链2条件-对于所有受骗者都使用keep=False
进行duplicated
,而对于第一个受骗者则没有参数:
df = df[df.duplicated(subset='ID', keep=False) & df.duplicated(subset='ID')]
print (df)
ID Email Address
4 1 ... ...
5 2 ... ...
7 5 ... ...
Is this what you want? 这是你想要的吗?
df[df.duplicated(subset='ID')]
ID Email Address
4 1 ... ...
5 2 ... ...
7 5 ... ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.