简体   繁体   中英

Not able to groupby apply function with two arguments in Python

My question is related to this one . I have a Pandas DataFrame as shown below. I want to calculate MAPE after grouping by period . However, I'm getting an error when trying to do so. What am I doing wrong?

# Create DataFrame
df = pd.DataFrame({
    'date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02'],
    'period': [1, 2, 1, 2, 3],
    'actuals': [50, 43, 42, 51, 49],
    'forecast': [49, 48, 50, 39, 51]
})

# Define MAPE
def mape(act, fct):
    return np.sum(abs((act - fct)/act))/len(act)

# Try to calculate MAPE for each period (this fails)
df.groupby('period').apply(mape, act='actuals', fct='forecast')
TypeError: mape() got multiple values for argument 'act'

Change the function to:

def mape(data, act, fct):
    act = data[act]
    fct = data[fct]
    return np.sum(abs((act - fct)/act))/len(act)

While using groupby.apply , the data of the group is passed to the function as first argument.

You can keep your definition of mape() function unchanged by changing the call as follows:

df.groupby('period').apply(lambda x: mape(x['actuals'], x['forecast']))

Your way of passing parameters requires changing the function definition as pointed out by the other answer. This is because the function need to have access of the DataFrame object in addition to the column names for it to access the column values.

Calling with lambda function in this way, the function receives the respective values in the parameters already and don't need the DataFrame name.

Calling in this way has the advantage that the function doesn't need to be customized for pandas environment and can be shared with other general Python programming logics.

Another alternative is to avoid the slow groupby + apply all together in favor of vectorized operations that act on the entire DataFrame and built-in DataFrame.GroupBy.mean which is implemented in cython.

Perform the calculation then you want the mean of that Series (within period).

(df['actuals'] - df['forecast']).div(df['actuals']).abs().groupby(df['period']).mean()

period
1    0.105238
2    0.175787
3    0.040816
dtype: float64

To clean up a little but, define a function to calculate the absolute percent error Series and take the mean of that.

def ape(act: pd.Series, fct: pd.Series):
    return (act - fct).div(act).abs()

ape(df['actuals'], df['forecast']).groupby(df['period']).mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM