i have a df like this
Date Tree Type #numberfruits
01/01 Apple #1 10
01/01 Apple #2 05
01/01 Orange #1 10
02/01 Apple #1 15
02/01 Apple #2 40
02/01 Orange #1 10
...
want to filter the 'Type' to keep only the Tree Types that produce the most fruit over all days combined. As orange, I only have one, so "Orange # 1" is the tree that produces the most orange over all days combined.
But in the case of Apple I have two types, # 1 and # 2 and in this case I want to drop the Type that produces less apples in the case above, I want to drop "Apple # 1" and keep "Apple # 2"
Can someone help me?
We can do groupby
with sum
then sort_values
withy drop_duplicates
s=df.groupby(['Tree','Type'],as_index=False)['#numberfruits'].sum().sort_values('#numberfruits').drop_duplicates('Tree',keep='last')
Tree Type #numberfruits
2 Orange #1 20
1 Apple #2 45
Update
s=df[df['#numberfruits'].eq(df.groupby(['Date','Tree'])['#numberfruits'].transform('max'))]
df = df.groupby(['Tree','Type'])['#numberfruits'].sum().reset_index(name='count')
df.sort_values(by='count', ascending=False).drop_duplicates(subset='Tree',keep='first')
Tree Type count
1 Apple #2 45
2 Orange #1 20
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.