DATAFRAME：删除列的值对于唯一键相等的重复项

Question

I want to drop duplicates from DF where column's values are equal for one unique key.我想从 DF 中删除重复项，其中列的值对于一个唯一键是相等的。 Example:例子：

In:在：

KEY         SYSTEM
TD-438426   AAA
TD-438426   BBB
TD-438426   AAA
TD-438709   BBB
TD-438709   BBB
TD-438750   CCC
TD-438750   CCC
TD-438750   CCC
TD-438874   AAA
TD-438874   BBB

Out:出去：

KEY         SYSTEM
TD-438426   AAA
TD-438426   BBB
TD-438709   BBB
TD-438750   CCC
TD-438874   AAA
TD-438874   BBB

PS Of course there are some exceptions that I want to catch. PS 当然，我想捕捉一些异常。

In:在：

KEY         TEST    SYSTEM
TD-438426   ABC     AAA
TD-438426   ABC     BBB

Out:出去：

KEY         TEST    SYSTEM
TD-438426   ABC     AAA
TD-438426   ABC     BBB

And和

In:在：

KEY         TEST    SYSTEM
TD-438426   ABC     AAA
TD-438426   CBA     AAA

Out:出去：

KEY         TEST    SYSTEM
TD-438426   ABC     AAA

Answer 1

Like @mcsioni mentioned in the comments, what you are looking for is df.drop_duplicates()就像评论中提到的@mcsioni，你要找的是df.drop_duplicates()

Also, it is useful to understand two arguments of this method, namely, subset and keep .此外，了解此方法的两个 arguments 也很有用，即subset和keep 。

Eg, You want to retain only unique values in the KEY column and keep the first SYSTEM value for each unique KEY , you'd do:例如，您只想保留KEY列中的唯一值，并为每个唯一的KEY保留第一个SYSTEM值，您可以这样做：

df.drop_duplicates(subset=['KEY'], keep='first')

If you just used df.drop_duplicates() without any arguments, the subset will be all the columns, which is what your desired output is asking for.如果您只是使用df.drop_duplicates()而没有任何 arguments，则子集将是所有列，这就是您想要的 output 所要求的。

EDIT编辑

To keep up with your new requirement, do this:要跟上您的新要求，请执行以下操作：

df.drop_duplicates(subset=['KEY', 'SYSTEM'], keep='first')

Note: The default behavior for the keep argument is 'first' but doesn't hurt to be explicit when working with high-level libraries like pandas.注意： keep参数的默认行为是'first' ，但在使用像 pandas 这样的高级库时显式使用也无妨。

DATAFRAME：删除列的值对于唯一键相等的重复项

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-04-15 09:35:44

DATAFRAME：删除列的值对于唯一键相等的重复项

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-04-15 09:35:44

解决方案1
0 已采纳 2022-04-15 09:35:44