简体   繁体   中英

Applying similar functions across multiple columns in python/pandas

Problem: Given the dataframe below, I'm trying to come up with the code that will apply a function to three distinct columns without having to write three separate function calls.

The code for the data:

import pandas as pd
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
    'days': [365, 365, 213, 318, 71],
    'spend_30day': [22, 241.5, 0, 27321.05, 345],
    'spend_90day': [22, 451.55, 64.32, 27321.05, 566.54],
    'spend_365day': [854.56, 451.55, 211.65, 27321.05, 566.54]}

df = pd.DataFrame(data)
cols = df.columns.tolist()
cols = ['name', 'days', 'spend_30day', 'spend_90day', 'spend_365day']
df = df[cols]
df

The function below will essentially annualize spend; if someone has fewer than, say, 365 days in the "days" column, the following function will tell me what the spend would have been if they had 365 days:

def annualize_spend_365(row):
    if row['days']/(float(365)) < 1:
        return (row['spend_365day']/(row['days']/float(365)))
    else:
        return row['spend_365day']

Then I apply the function to the particular column:

df.spend_365day = df.apply(annualize_spend_365, axis=1).round(2)
df

This works exactly as I want it to for that one column. However, I don't want to have to rewrite this for each of the three different "spend" columns (30, 90, 365). I want to be able to write code that will generalize and apply this function to multiple columns in one pass.

I thought I could create lists of the columns and their respective days, use the "zip" function, and nest the function in a for loop, but my attempt below ultimately fails:

spend_cols = [df.spend_30day, df.spend_90day, df.spend_365day]
days_list = [30, 90, 365]

for col, day in zip(spend_cols, days_list):
    def annualize_spend(row):
        if (row.days/(float(day)) < 1:
            return (row.col)/((row.days)/float(day))
        else:
            return row.col
    col = df.apply(annualize_spend, axis = 1)

The error:

AttributeError: ("'Series' object has no attribute 'col'")

I'm not sure why the loop approach is failing. Regardless, I'm hoping for guidance on how to generalize function application in pandas. Thanks in advance!

Look at your two function definitions:

def annualize_spend_365(row):
    if row['days']/(float(365)) < 1:
        return (row['spend_365day']/(row['days']/float(365)))
    else:
        return row['spend_365day']

and

#col in [df.spend_30day, df.spend_90day, df.spend_365day]
def annualize_spend(row):
    if (row.days/(float(day)) < 1:
        return (row.col)/((row.days)/float(day))
    else:
        return row.col

See the difference? On the one hand, in the first case you access the fields with explicit field names, and it works. In the second case you try to access row.col , which fails, but in this case col assumes the values of the corresponding fields in df . Instead try

spend_cols = ['spend_30day', 'spend_90day', 'spend_365day']

before your loop. On the other hand, in the syntax df.days the field name is actually "days", but in df.col the field name is not the string "col", but the value of the string col . So you might want to use row[col] in the latter case as well. And anyway, I'm not sure how wise it is to take col as an output variable inside your loop over col .


I'm unfamiliar with pandas.DataFrame.apply , but it's probably possible to use a single function definition, which takes the number of days and the field of interest as input variables:

def annualize_spend(col,day,row):
    if (row['days']/(float(day)) < 1:
        return (row[col])/((row['days'])/float(day))
    else:
        return row[col]

spend_cols = ['spend_30day', 'spend_90day', 'spend_365day']
days_list = [30, 90, 365]

for col, day in zip(spend_cols, days_list):
    col = df.apply(lambda row,col=col,day=day: annualize_spend(col,day,row), axis = 1)

The lambda will ensure that only one input parameter of your function is dangling free when it gets apply d.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM