I have a pandas DataFrame with two float columns, col_x
and col_y
.
I want to return the sum of col_x * col_y
divided by the sum of col_x
Can this be done with a custom aggregate function?
I am trying to do something like this:
import pandas as pd
def aggregation_function(x, y):
return sum(x * y) / sum(x)
df = pd.DataFrame([(0.1, 0.2), (0.3, 0.4), (0.5, 0.6)], columns=["col_x", "col_y"])
result = df.agg(aggregation_function, axis="columns", args=("col_x", "col_y"))
I know that the aggregation function probably doesn't make sense but I can't even get to the point where I can try other things because I am getting this error:
TypeError: apply() got multiple values for keyword argument 'args'
I don't know how else I can specify the args
for my aggregation function. I've tried using kwargs
, too but nothing I do will work. There is no example in the docs for this but it seems to say that it is possible.
How can you specify the args for the aggregation function?
The desired result of the output aggregation would be a single value
First , you can use apply
on axis=1
for such problems:
df.apply(lambda x: aggregation_function(x['col_x'],x['col_y']),axis=1)
however , this will result in error in your case because the aggregate function you have is calculating col_x * col_y
for each row, sum doesnot work with a scalar value , it needs an iterable:
Signature: sum(iterable, start=0, /) Docstring: Return the sum of a 'start' value (default: 0) plus an iterable of numbers
Hence sum(0.2)
doesnot work.
If we remove the sum from the aggregate function , this works as intended:
def aggregation_function(x, y):return (x * y)/ x
df.apply(lambda x: aggregation_function(x['col_x'],x['col_y']),axis=1)
0 0.2
1 0.4
2 0.6
dtype: float64
However as you say you want to divide sum of col_x
with the result of multiplication of col_x
and col_y
, you can tweak the function and use series.sum
and use it directly with the dataframe though this can be vectorized to df['col_x'].mul(df['col_y']).sum()/df['col_x'].sum()
def aggregation_function(x, y): return (x * y).sum() / x.sum()
aggregation_function(df['col_x'],df['col_y'])
0.4888888888888889
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.