简体   繁体   English

从数据框中列中条件值> 0 的所有列中删除重复项

[英]Remove duplicates from all columns with condition value>0 in columns in data frame

I need to remove duplicates from all of the columns.我需要从所有列中删除重复项。

My data:我的数据:

id   country  publisher   weak     A        B        C
123    US        X          1     6.77      0        0
123    US        X          1       0      1.23     88.7
456    BZ        Y          2       0      56.87    9.65      
456    BZ        Y          2     2.76       0       0  
456    BZ        Y          2       0        0       0

I used drop_duplicates-我使用了 drop_duplicates-

df1=df.drop_duplicates()

But I need a condition that will take all the values>0 for each id.但我需要一个条件,它将为每个 id 获取所有值>0。

Also, I have more columns than just 'A','B','C' so I'm looking for solution that will take all the columns into account.此外,我的列不仅仅是“A”、“B”、“C”,所以我正在寻找将所有列都考虑在内的解决方案。

Here an example for what I'm looking for:这是我正在寻找的示例:

id   country  publisher  weak     A       B        C
123    US        X        1     6.77     1.23     88.7
456    BZ        Y        2     2.76     56.87    9.65

This will give you your desired output这将为您提供所需的 output
groups=df.groupby(['id','country','publisher']).sum()

Try doing:尝试做:

cols = ['A', 'B'] # change columns to aggregate more data

def app_func(s):
    return s[~s.eq(0)].bfill().dropna().drop_duplicates()

df.groupby(['id', 'country', 'publisher'])[cols].apply(app_func).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM