[英]DATAFRAME: drop duplicates where column's values are equal for unique key
I want to drop duplicates from DF where column's values are equal for one unique key.我想从 DF 中删除重复项,其中列的值对于一个唯一键是相等的。 Example:
例子:
In:在:
KEY SYSTEM
TD-438426 AAA
TD-438426 BBB
TD-438426 AAA
TD-438709 BBB
TD-438709 BBB
TD-438750 CCC
TD-438750 CCC
TD-438750 CCC
TD-438874 AAA
TD-438874 BBB
Out:出去:
KEY SYSTEM
TD-438426 AAA
TD-438426 BBB
TD-438709 BBB
TD-438750 CCC
TD-438874 AAA
TD-438874 BBB
PS Of course there are some exceptions that I want to catch. PS 当然,我想捕捉一些异常。
In:在:
KEY TEST SYSTEM
TD-438426 ABC AAA
TD-438426 ABC BBB
Out:出去:
KEY TEST SYSTEM
TD-438426 ABC AAA
TD-438426 ABC BBB
And和
In:在:
KEY TEST SYSTEM
TD-438426 ABC AAA
TD-438426 CBA AAA
Out:出去:
KEY TEST SYSTEM
TD-438426 ABC AAA
Like @mcsioni mentioned in the comments, what you are looking for is df.drop_duplicates()
就像评论中提到的@mcsioni,你要找的是
df.drop_duplicates()
Also, it is useful to understand two arguments of this method, namely, subset
and keep
.此外,了解此方法的两个 arguments 也很有用,即
subset
和keep
。
Eg, You want to retain only unique values in the KEY
column and keep the first SYSTEM
value for each unique KEY
, you'd do:例如,您只想保留
KEY
列中的唯一值,并为每个唯一的KEY
保留第一个SYSTEM
值,您可以这样做:
df.drop_duplicates(subset=['KEY'], keep='first')
If you just used df.drop_duplicates()
without any arguments, the subset will be all the columns, which is what your desired output is asking for.如果您只是使用
df.drop_duplicates()
而没有任何 arguments,则子集将是所有列,这就是您想要的 output 所要求的。
EDIT编辑
To keep up with your new requirement, do this:要跟上您的新要求,请执行以下操作:
df.drop_duplicates(subset=['KEY', 'SYSTEM'], keep='first')
Note: The default behavior for the keep
argument is 'first'
but doesn't hurt to be explicit when working with high-level libraries like pandas.注意:
keep
参数的默认行为是'first'
,但在使用像 pandas 这样的高级库时显式使用也无妨。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.