简体   繁体   中英

Pandas Groupby nlargest (unique nlargest)

I have a following pandas df:

date                    label              quantity
01/01/2019                A                  5
01/01/2019                B                  5
01/01/2019                C                  5
01/01/2019                D                  4
01/01/2019                E                  3
01/01/2019                F                  2
01/01/2019                G                  0
01/02/2019                A                  7
01/02/2019                B                  6
01/02/2019                C                  5
01/02/2019                D                  5
01/02/2019                E                  3
01/02/2019                F                  2
01/02/2019                G                  1
01/03/2019                A                  10
01/03/2019                B                  9
01/03/2019                C                  8
01/03/2019                D                  7
01/03/2019                E                  6
01/03/2019                F                  5
01/03/2019                G                  4

I am trying to obtain the top three quantities with their respective labels per date, keeping duplicate quantity and labels.

This is my code:

df = df.groupby('date').apply(lambda x: x.nlargest(3, ['quantity'], keep='all')).reset_index(drop=True)

Should there be a duplicated quantity like in 01/01/2019 where A, B and C each has 5 quantity, D and E will not be included, however I will want D and E to be included. I will want a unique nlargest(3) kind of thing.

This is my desired outcome where I have the 3 largest distinct quantity in my df regardless if more than 1 label shares the same quantity:

date                    label              quantity
01/01/2019                A                  5
01/01/2019                B                  5
01/01/2019                C                  5
01/01/2019                D                  4
01/01/2019                E                  3
01/02/2019                A                  7
01/02/2019                B                  6
01/02/2019                C                  5
01/02/2019                D                  5
01/03/2019                A                  10
01/03/2019                B                  9
01/03/2019                C                  8

Thank you in advance!

def get_slice(df):
    x_vals = df['quantity'].drop_duplicates().nlargest(3)
    return df[df['quantity'].isin(x_vals)]

print(df.groupby('date').apply(get_slice).reset_index(drop=True))

You can find unique values for each group, get the list with three max values and select rows which are in this list. Then you can join the rest of the table:

df1 = df.groupby('date')['quantity'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0).join(df['label'])

print(df1)

Output:

          date  quantity label
0   01/01/2019         5     A
1   01/01/2019         5     B
2   01/01/2019         5     C
3   01/01/2019         4     D
4   01/01/2019         3     E
7   01/02/2019         7     A
8   01/02/2019         6     B
9   01/02/2019         5     C
10  01/02/2019         5     D
14  01/03/2019        10     A
15  01/03/2019         9     B
16  01/03/2019         8     C

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM