简体   繁体   中英

i Need to Filter dataframe by some criteria i dont know how

i have a df like this

Date   Tree   Type #numberfruits
01/01  Apple   #1      10
01/01  Apple   #2      05
01/01  Orange  #1      10
02/01  Apple   #1      15
02/01  Apple   #2      40
02/01  Orange  #1      10
 ...

want to filter the 'Type' to keep only the Tree Types that produce the most fruit over all days combined. As orange, I only have one, so "Orange # 1" is the tree that produces the most orange over all days combined.

But in the case of Apple I have two types, # 1 and # 2 and in this case I want to drop the Type that produces less apples in the case above, I want to drop "Apple # 1" and keep "Apple # 2"

Can someone help me?

We can do groupby with sum then sort_values withy drop_duplicates

s=df.groupby(['Tree','Type'],as_index=False)['#numberfruits'].sum().sort_values('#numberfruits').drop_duplicates('Tree',keep='last')
     Tree Type  #numberfruits
2  Orange   #1             20
1   Apple   #2             45

Update

s=df[df['#numberfruits'].eq(df.groupby(['Date','Tree'])['#numberfruits'].transform('max'))]
df = df.groupby(['Tree','Type'])['#numberfruits'].sum().reset_index(name='count')

df.sort_values(by='count', ascending=False).drop_duplicates(subset='Tree',keep='first')

    Tree    Type    count
1   Apple   #2      45
2   Orange  #1      20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM