I need to remove duplicates from all of the columns.
My data:
id country publisher weak A B C
123 US X 1 6.77 0 0
123 US X 1 0 1.23 88.7
456 BZ Y 2 0 56.87 9.65
456 BZ Y 2 2.76 0 0
456 BZ Y 2 0 0 0
I used drop_duplicates-
df1=df.drop_duplicates()
But I need a condition that will take all the values>0 for each id.
Also, I have more columns than just 'A','B','C' so I'm looking for solution that will take all the columns into account.
Here an example for what I'm looking for:
id country publisher weak A B C
123 US X 1 6.77 1.23 88.7
456 BZ Y 2 2.76 56.87 9.65
This will give you your desired output
groups=df.groupby(['id','country','publisher']).sum()
Try doing:
cols = ['A', 'B'] # change columns to aggregate more data
def app_func(s):
return s[~s.eq(0)].bfill().dropna().drop_duplicates()
df.groupby(['id', 'country', 'publisher'])[cols].apply(app_func).reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.