简体   繁体   English

DATAFRAME:删除列的值对于唯一键相等的重复项

[英]DATAFRAME: drop duplicates where column's values are equal for unique key

I want to drop duplicates from DF where column's values are equal for one unique key.我想从 DF 中删除重复项,其中列的值对于一个唯一键是相等的。 Example:例子:

In:在:

KEY         SYSTEM
TD-438426   AAA
TD-438426   BBB
TD-438426   AAA
TD-438709   BBB
TD-438709   BBB
TD-438750   CCC
TD-438750   CCC
TD-438750   CCC
TD-438874   AAA
TD-438874   BBB

Out:出去:

KEY         SYSTEM
TD-438426   AAA
TD-438426   BBB
TD-438709   BBB
TD-438750   CCC
TD-438874   AAA
TD-438874   BBB

PS Of course there are some exceptions that I want to catch. PS 当然,我想捕捉一些异常。

In:在:

KEY         TEST    SYSTEM
TD-438426   ABC     AAA
TD-438426   ABC     BBB

Out:出去:

KEY         TEST    SYSTEM
TD-438426   ABC     AAA
TD-438426   ABC     BBB

And

In:在:

KEY         TEST    SYSTEM
TD-438426   ABC     AAA
TD-438426   CBA     AAA

Out:出去:

KEY         TEST    SYSTEM
TD-438426   ABC     AAA

Like @mcsioni mentioned in the comments, what you are looking for is df.drop_duplicates()就像评论中提到的@mcsioni,你要找的是df.drop_duplicates()

Also, it is useful to understand two arguments of this method, namely, subset and keep .此外,了解此方法的两个 arguments 也很有用,即subsetkeep

Eg, You want to retain only unique values in the KEY column and keep the first SYSTEM value for each unique KEY , you'd do:例如,您只想保留KEY列中的唯一值,并为每个唯一的KEY保留第一个SYSTEM值,您可以这样做:

df.drop_duplicates(subset=['KEY'], keep='first')

If you just used df.drop_duplicates() without any arguments, the subset will be all the columns, which is what your desired output is asking for.如果您只是使用df.drop_duplicates()而没有任何 arguments,则子集将是所有列,这就是您想要的 output 所要求的。

EDIT编辑

To keep up with your new requirement, do this:要跟上您的新要求,请执行以下操作:

df.drop_duplicates(subset=['KEY', 'SYSTEM'], keep='first')

Note: The default behavior for the keep argument is 'first' but doesn't hurt to be explicit when working with high-level libraries like pandas.注意: keep参数的默认行为是'first' ,但在使用像 pandas 这样的高级库时显式使用也无妨。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在列相同且时间戳接近的 DataFrame 中删除重复项 - Drop Duplicates in a DataFrame where a column are identical and have near timestamps 在不使用删除重复项的情况下保留数据框中列的唯一值 - Keep unique values of a column in a data frame WITHOUT using drop duplicates 根据其他列值从熊猫 dataframe 中删除重复项 - Drop duplicates from a panda dataframe based on other column values 如何创建从列中获取的唯一值的熊猫数据框,没有重复项 - How to create a pandas dataframe of unique values fetched from column with no duplicates 按等于无值的列值条件在DataFrame中删除行 - Drop rows in DataFrame by conditions on column values equal to None Pandas Dataframe 在一列列表中删除重复项? - Pandas Dataframe drop duplicates in a column of lists? 删除存在 null 值的列的重复项 - Drop duplicates of a column where null value is present 如果列中超过 90% 的值是 0,则删除 Dataframe 中的列 - Drop columns in Dataframe if more than 90% of the values in the column are 0's 如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项? - How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas? 如何随机删除Pandas数据框中的行,直到一列中值的数量相等? - How to randomly drop rows in Pandas dataframe until there are equal number of values in a column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM