简体   繁体   English

如何根据 DataFrame Python Pandas 中其他 2 列中的值删除一列中的重复项?

[英]How to drop duplicates in one column based on values in 2 other columns in DataFrame in Python Pandas?

I have DataFrame in Python Pandas like below:我在 Python Pandas 中有 DataFrame,如下所示:

data types:数据类型:

  • ID - int ID - 整数

  • TYPE - object类型 - object

  • TG_A - int TG_A - 整数

  • TG_B - int TG_B - 整数

    ID ID TYPE类型 TG_A TG_A TG_B TG_B
    111 111 A一种 1 1个 0 0
    111 111 B 1 1个 0 0
    222 222 B 1 1个 0 0
    222 222 A一种 1 1个 0 0
    333 333 B 0 0 1 1个
    333 333 A一种 0 0 1 1个

And I need to drop duplicates in above DataFrame, so as to:我需要在上面的 DataFrame 中删除重复项,以便:

  • If value in ID in my DF is duplicated -> drop rows where TYPE = B and TG_A = 1 or TYPE = A and TG_B = 1如果我的 DF 中的 ID 值重复 -> 删除 TYPE = B 和 TG_A = 1 或 TYPE = A 和 TG_B = 1 的行

So, as a result I need something like below:因此,结果我需要如下内容:

ID  | TYPE | TG_A | TG_B
----|------|------|-----
111 | A    | 1    | 0
222 | A    | 1    | 0
333 | B    | 0    | 1

How can I do that in Python Pandas?我怎样才能在 Python Pandas 中做到这一点?

You can use two boolean masks and groupby.idxmax to get the first non matching value:您可以使用两个 boolean 掩码和groupby.idxmax来获取第一个不匹配的值:

m1 = df['TYPE'].eq('B') & df['TG_A'].eq(1)
m2 = df['TYPE'].eq('A') & df['TG_B'].eq(1)

out = df.loc[(~(m1|m2)).groupby(df['ID']).idxmax()]

Output: Output:

    ID TYPE  TG_A  TG_B
0  111    A     1     0
3  222    A     1     0
4  333    B     0     1
df[df['TYPE'].eq('A').eq(df['TG_A'])]

result

    ID  TYPE    TG_A    TG_B
0   111 A       1       0
3   222 A       1       0
4   333 B       0       1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何聚合 DataFrame 并根据 Python Pandas 中两列中的值删除重复项? - How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas? 根据 pandas dataframe 中的其他三列更改一列的值 - Changing values of one column based on the other three columns in pandas dataframe 如何根据列的值(列的名称不同)从 pandas dataframe 中删除重复的列? - How to drop duplicates columns from a pandas dataframe, based on columns' values (columns don't have the same name)? 根据其他列值从熊猫 dataframe 中删除重复项 - Drop duplicates from a panda dataframe based on other column values 如何将基于其他列值的列附加到pandas数据框 - How to append columns based on other column values to pandas dataframe 根据另一列的重复项删除一列的重复项,将另一列重复项保留在 pandas - drop duplicates of one column based on duplicates of another column keeping the other column duplicates in pandas 根据另一列(Python,Pandas)中的值删除一列的重复项 - Drop duplicates of one column based on value in another column, Python, Pandas 如何将一列中的值传播到其他列中的行(熊猫数据框) - How to propagate values in one column to rows in other columns (pandas dataframe) 如何根据不同列中的值向 pandas dataframe 添加一列? - How to add one column to pandas dataframe based on values in different columns? Pandas:将值从一列转移到另一列,并使用 python 删除重复项 - Pandas: shift values from one column to other, and drop duplicates using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM