简体   繁体   中英

Adding new DataFrame column in Pandas not working

So I have a pandas DataFrame that contains some batting statistics from the 2001 Arizona Diamondbacks. I'm pretty new to Python/Pandas and so I was trying to add in a few columns using lambda functions like these

PA_lambda = lambda row: row.AB + row.BB + row.HBP + row.SH + row.SF
OBP_lambda = lambda row: (row.H + row.BB + row.HBP) / (row.PA) if row.PA > 0 else 'NaN'
AVG_lambda = lambda row: row.H / row.AB if row.AB > 0 else 'NaN'

Later down the road I want to work with more data that is very similar, and will need to add these columns, and many more in the future. So I made a separate python module containing the functions, a list with each function and the column name that it should have, and a function to iterate through the list and add the columns onto the end of the DataFrame:

import pandas as pd 


PA_lambda = lambda row: row.AB + row.BB + row.HBP + row.SH + row.SF
OBP_lambda = lambda row: (row.H + row.BB + row.HBP) / (row.PA) if row.PA > 0 else 'NaN'
AVG_lambda = lambda row: row.H / row.AB if row.AB > 0 else 'NaN'

stat_functions = [['pa', PA_lambda], ['obp',OBP_lambda], ['avg', AVG_lambda]]
def format_df(df):
    for func in stat_functions:
        df['func[0]'] = df.apply(func[1], axis=1)

I'm not sure if I need the pandas module in there or not, but whenever I import the module into my Jupyter Notebook and try to call format_df, only the first function PA_lambda is run and it's saved into the DataFrame under the column label 'func'. I thought that creating a list with the column name and the function itself would work, but once it tries to apply OBP_lambda to the df it returns the error

AttributeError: 'Series' object has no attribute 'PA'

Sorry this is a little long, it's my first post here but if you have a solution I am very eager to learn.

You don't need to use apply for that, you can directly do these operations on columns in pandas:

df['pa'] = df['AB'] + df['BB'] + df['HBP'] + df['SH'] +df['SF']
df['obp'] = (df['H']+ df['BB']+df['HBP'])/df['PA']
df['avg'] = df['H']/df['AB']

Your format_df(df) function is currently looping through each function and saving the result of each to the same column 'func' because your string formatting is not correct. You need to update the last line of the function with an 'f-string' (put an f before the string) so that it is formatted at run-time.

def format_df(df):
    for func in stat_functions:
        df[f'func[0]'] = df.apply(func[1], axis=1)

What you needed to do is use the label element of the func item correctly when creating the new column in the df.

like this:

for func in stat_functions: 
    df[func[0]] = df.apply(func[1], axis=1)

notice how this code is referencing the value of func[0] and not the string 'func[0]' when creating a new column in the dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM