简体   繁体   中英

Pandas groupby with sorting if conditions

I have the following dataframe df , which comes from a dataset:

    Rk  Player  Pos Age Tm  G   GS  MP  FG  FGA ... FT% ORB DRB TRB AST STL BLK TOV PF  PS/G
0   1   Stephen Curry   PG  27  GSW 79  79  34.2    10.2    20.2    ... 0.908   0.9 4.6 5.4 6.7 2.1 0.2 3.3 2.0 30.1
1   2   James Harden    SG  26  HOU 82  82  38.1    8.7 19.7    ... 0.860   0.8 5.3 6.1 7.5 1.7 0.6 4.6 2.8 29.0
2   3   Kevin Durant    SF  27  OKC 72  72  35.8    9.7 19.2    ... 0.898   0.6 7.6 8.2 5.0 1.0 1.2 3.5 1.9 28.2
3   4   DeMarcus Cousins    C   25  SAC 65  65  34.6    9.2 20.5    ... 0.718   2.4 9.1 11.5    3.3 1.6 1.4 3.8 3.6 26.9
4   5   LeBron James    SF  31  CLE 76  76  35.6    9.7 18.6    ... 0.731   1.5 6.0 7.4 6.8 1.4 0.6 3.3 1.9 25.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
471 472 Joe Harris  SG  24  CLE 5   0   3.0 0.2 0.8 ... NaN 0.0 0.6 0.6 0.4 0.0 0.0 0.2 0.2 0.6
472 473 Bruno Caboclo   SF  20  TOR 6   1   7.2 0.2 2.0 ... NaN 0.2 0.2 0.3 0.2 0.3 0.2 0.7 0.3 0.5
473 474 Sam Dekker  SF  21  HOU 3   0   2.0 0.0 0.0 ... NaN 0.0 0.3 0.3 0.0 0.3 0.0 0.0 0.0 0.0
474 475 J.J. O'Brien    SF  23  UTA 2   0   3.0 0.0 0.5 ... NaN 0.0 0.5 0.5 0.0 0.5 0.0 0.0 0.5 0.0
475 476 Nate Robinson   PG  31  NOP 2   1   11.5    0.0 0.5 ... NaN 0.0 0.0 0.0 2.0 0.5 0.0 0.0 2.5 0.0

I need to group df by teams ( Tm ), find the best average scorer(s) per team ( PS/G ), ignoring the Tm : TOT row. Sort descending by points per game with ties broken by team name ( Tm ). If there are multiple top scorers, list both and sort them by player name ascending.

What I have done is the following:

grouped = df[df['Tm']!="TOT"].groupby('Tm')['PS/G'].max().sort_values(ascending=False)

And I am getting:

Tm
GSW    30.1
HOU    29.0
OKC    28.2
SAC    26.9
CLE    25.3
POR    25.1
NOP    24.3
TOR    23.5
IND    23.1
BOS    22.2
NYK    21.8
LAC    21.4
SAS    21.2
CHI    20.9
CHO    20.9
MIN    20.7
BRK    20.6
PHO    20.4
WAS    19.9
UTA    19.7
DEN    19.5
MIA    19.1
DET    18.8
DAL    18.3
ORL    18.2
MIL    18.2
LAL    17.6
PHI    17.5
ATL    17.1
MEM    16.6
Name: PS/G, dtype: float64

However, I need to include also the Player column in the result. So my first question is how can I achieve that?

My second question is how to include these two requirements:

  1. with ties broken by team name (Tm).
  2. If there are multiple top scorers, list both and sort them by player name ascending.

I finally managed to figure it out with the following:

grouped1 = df.loc[df[df['Tm']!="TOT"].groupby(['Tm'])['PS/G'].idxmax()].sort_values(by=['PS/G', 'Player'], ascending=[0,1]).reset_index()

grouped_final = grouped1[['Tm', 'Player', 'PS/G']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM