Applying a custom aggregation function to a pandas DataFrame

Question

I have a pandas DataFrame with two float columns, col_x and col_y .

I want to return the sum of col_x * col_y divided by the sum of col_x

Can this be done with a custom aggregate function?

I am trying to do something like this:

import pandas as pd


def aggregation_function(x, y):
    return sum(x * y) / sum(x)


df = pd.DataFrame([(0.1, 0.2), (0.3, 0.4), (0.5, 0.6)], columns=["col_x", "col_y"])
result = df.agg(aggregation_function, axis="columns", args=("col_x", "col_y"))

I know that the aggregation function probably doesn't make sense but I can't even get to the point where I can try other things because I am getting this error:

TypeError: apply() got multiple values for keyword argument 'args'

I don't know how else I can specify the args for my aggregation function. I've tried using kwargs , too but nothing I do will work. There is no example in the docs for this but it seems to say that it is possible.

How can you specify the args for the aggregation function?

The desired result of the output aggregation would be a single value

Answer 1

First , you can use apply on axis=1 for such problems:

df.apply(lambda x: aggregation_function(x['col_x'],x['col_y']),axis=1)

however , this will result in error in your case because the aggregate function you have is calculating col_x * col_y for each row, sum doesnot work with a scalar value , it needs an iterable:

Signature: sum(iterable, start=0, /) Docstring: Return the sum of a 'start' value (default: 0) plus an iterable of numbers

Hence sum(0.2) doesnot work.

If we remove the sum from the aggregate function , this works as intended:

def aggregation_function(x, y):return (x * y)/ x
df.apply(lambda x: aggregation_function(x['col_x'],x['col_y']),axis=1)

0    0.2
1    0.4
2    0.6
dtype: float64

However as you say you want to divide sum of col_x with the result of multiplication of col_x and col_y , you can tweak the function and use series.sum and use it directly with the dataframe though this can be vectorized to df['col_x'].mul(df['col_y']).sum()/df['col_x'].sum()

def aggregation_function(x, y): return (x * y).sum() / x.sum()
aggregation_function(df['col_x'],df['col_y'])

0.4888888888888889

Applying a custom aggregation function to a pandas DataFrame

Question

1 answers

solution1
1 ACCPTED 2020-09-29 12:29:13

Applying a custom aggregation function to a pandas DataFrame

Question

1 answers

solution1 1 ACCPTED 2020-09-29 12:29:13

solution1
1 ACCPTED 2020-09-29 12:29:13