简体   繁体   中英

How to iterate over pandas DataFrame multi-index and filtering based off another column value

I am trying to do iterate over a multi-index with the following DataFrame.

Picture of My DataFrame

Essentially, what I am trying to do is reduce the DataFrame to the top QB, top 2 RB's, top 3 WR's, and top TE based off their values in their respective "FantasyPoints" column for each NFL team. I have been trying to figure out for hours how to do this, but can't come up with a solution. I tried using groupby but no luck, and figured I may have to iterate over the multi-index but haven't figured that out either. Thanks in advance to anyone who can help me figure this out. Below is the code used to generate the DataFrame in its existing state. Here is a link to the CSV file that is being used. https://drive.google.com/file/d/1hX1Jmjk4RBxsH8tt8g1tqwKqrjZkFZp_/view?usp=sharing

#import our CSV file
df = pd.read_csv('2019.csv')

#drop unneccessary columns
df.drop(['Rk', '2PM', '2PP', 'FantPt', 'DKPt', 'FDPt', 
         'VBD', 'PosRank', 'OvRank', 'PPR', 'Fmb', 
         'GS', 'Age', 'Tgt', 'Y/A', 'Att', 'Att.1', 'Cmp', 'Y/R'], axis=1, inplace=True)

#fix name formatting
df['Player'] = df['Player'].apply(lambda x: x.split('*')[0]).apply(lambda x: x.split('\\')[0])

#rename columns
df.rename({
    'TD': 'PassingTD',
    'TD.1': 'RushingTD',
    'TD.2': 'ReceivingTD',
    'TD.3': 'TotalTD',
    'Yds': 'PassingYDs',
    'Yds.1': 'RushingYDs',
    'Yds.2': 'ReceivingYDs',
}, axis=1, inplace=True)

df['FantasyPoints'] = (df['PassingYDs']*0.04 + df['PassingTD']*4 - df['Int']*2 + df['RushingYDs']*.1 
                       + df['RushingTD']*6 + df['Rec']*1 + df['ReceivingYDs']*.1 + df['ReceivingTD']*6 - df['FL']*2)

df = df[['Tm', 'FantPos', 'FantasyPoints']]

df = df[df['Tm'] != '2TM']
df = df[df['Tm'] != '3TM']

df.set_index(['Tm', 'FantPos'], inplace=True)
df = df.sort_index()
df.head(30)

Why do multi-index?? You can easily set up a dictionary to iterate through and grab the top n rows for each condition/position:

import pandas as pd

#import our CSV file
df = pd.read_csv('2019.csv')

#drop unneccessary columns
df.drop(['Rk', '2PM', '2PP', 'FantPt', 'DKPt', 'FDPt', 
         'VBD', 'PosRank', 'OvRank', 'PPR', 'Fmb', 
         'GS', 'Age', 'Tgt', 'Y/A', 'Att', 'Att.1', 'Cmp', 'Y/R'], axis=1, inplace=True)

#fix name formatting
df['Player'] = df['Player'].apply(lambda x: x.split('*')[0]).apply(lambda x: x.split('\\')[0])

#rename columns
df.rename({
    'TD': 'PassingTD',
    'TD.1': 'RushingTD',
    'TD.2': 'ReceivingTD',
    'TD.3': 'TotalTD',
    'Yds': 'PassingYDs',
    'Yds.1': 'RushingYDs',
    'Yds.2': 'ReceivingYDs',
}, axis=1, inplace=True)

df['FantasyPoints'] = (df['PassingYDs']*0.04 + df['PassingTD']*4 - df['Int']*2 + df['RushingYDs']*.1 
                       + df['RushingTD']*6 + df['Rec']*1 + df['ReceivingYDs']*.1 + df['ReceivingTD']*6 - df['FL']*2)

df = df[['Tm', 'FantPos', 'FantasyPoints']]

df = df[df['Tm'] != '2TM']
df = df[df['Tm'] != '3TM']


dictionary = {'QB':1,'RB':2,'WR':3,'TE':1}
results_df = pd.DataFrame()
for pos, n in dictionary.items():
    results_df = results_df.append(df[df['FantPos'] == pos].nlargest(n, columns='FantasyPoints'), sort=True).reset_index(drop=True)

Output:

print (results_df)

  FantPos  FantasyPoints   Tm
0      QB         415.68  BAL
1      RB         469.20  CAR
2      RB         314.80  GNB
3      WR         374.60  NOR
4      WR         274.10  TAM
5      WR         274.10  ATL
6      TE         254.30  KAN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM