简体   繁体   中英

Applying a custom aggregation function to a pandas DataFrame

I have a pandas DataFrame with two float columns, col_x and col_y .

I want to return the sum of col_x * col_y divided by the sum of col_x

Can this be done with a custom aggregate function?

I am trying to do something like this:

import pandas as pd


def aggregation_function(x, y):
    return sum(x * y) / sum(x)


df = pd.DataFrame([(0.1, 0.2), (0.3, 0.4), (0.5, 0.6)], columns=["col_x", "col_y"])
result = df.agg(aggregation_function, axis="columns", args=("col_x", "col_y"))

I know that the aggregation function probably doesn't make sense but I can't even get to the point where I can try other things because I am getting this error:

TypeError: apply() got multiple values for keyword argument 'args'

I don't know how else I can specify the args for my aggregation function. I've tried using kwargs , too but nothing I do will work. There is no example in the docs for this but it seems to say that it is possible.

How can you specify the args for the aggregation function?

The desired result of the output aggregation would be a single value

First , you can use apply on axis=1 for such problems:

df.apply(lambda x: aggregation_function(x['col_x'],x['col_y']),axis=1)

however , this will result in error in your case because the aggregate function you have is calculating col_x * col_y for each row, sum doesnot work with a scalar value , it needs an iterable:

Signature: sum(iterable, start=0, /) Docstring: Return the sum of a 'start' value (default: 0) plus an iterable of numbers

Hence sum(0.2) doesnot work.

If we remove the sum from the aggregate function , this works as intended:

def aggregation_function(x, y):return (x * y)/ x
df.apply(lambda x: aggregation_function(x['col_x'],x['col_y']),axis=1)

0    0.2
1    0.4
2    0.6
dtype: float64

However as you say you want to divide sum of col_x with the result of multiplication of col_x and col_y , you can tweak the function and use series.sum and use it directly with the dataframe though this can be vectorized to df['col_x'].mul(df['col_y']).sum()/df['col_x'].sum()

def aggregation_function(x, y): return (x * y).sum() / x.sum()
aggregation_function(df['col_x'],df['col_y'])

0.4888888888888889

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM