[英]Use Pandas groupby() + apply() with arguments
I would like to use df.groupby()
in combination with apply()
to apply a function to each row per group. 我想将
df.groupby()
与apply()
结合使用,将函数应用于每个组的每一行。
I normally use the following code, which usually works (note, that this is without groupby()
): 我通常使用以下代码,通常可以工作(注意,这是没有
groupby()
):
df.apply(myFunction, args=(arg1,))
With the groupby()
I tried the following: 使用
groupby()
我尝试了以下内容:
df.groupby('columnName').apply(myFunction, args=(arg1,))
However, I get the following error: 但是,我收到以下错误:
TypeError: myFunction() got an unexpected keyword argument 'args'
TypeError:myFunction()得到一个意外的关键字参数'args'
Hence, my question is: How can I use groupby()
and apply()
with a function that needs arguments? 因此,我的问题是: 我如何使用
groupby()
和apply()
与需要参数的函数?
pandas.core.groupby.GroupBy.apply
does NOT have named parameter args
, but pandas.DataFrame.apply
does have it. pandas.core.groupby.GroupBy.apply
没有命名参数args
,但pandas.DataFrame.apply
确实拥有它。
So try this: 试试这个:
df.groupby('columnName').apply(lambda x: myFunction(x, arg1))
or as suggested by @Zero : 或者根据@Zero的建议:
df.groupby('columnName').apply(myFunction, ('arg1'))
Demo: 演示:
In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))
In [83]: df
Out[83]:
a b c
0 0 3 1
1 0 3 4
2 3 0 4
3 4 2 3
4 3 4 1
In [84]: def f(ser, n):
...: return ser.max() * n
...:
In [85]: df.apply(f, args=(10,))
Out[85]:
a 40
b 40
c 40
dtype: int64
when using GroupBy.apply
you can pass either a named arguments: 使用
GroupBy.apply
您可以传递命名参数:
In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
a tuple of arguments: 一个参数元组:
In [87]: df.groupby('a').apply(f, (10))
Out[87]:
a b c
a
0 0 30 40
3 30 40 40
4 40 20 30
Some confusion here over why using an args
parameter throws an error might stem from the fact that pandas.DataFrame.apply
does have an args
parameter (a tuple), while pandas.core.groupby.GroupBy.apply
does not. 关于为什么使用
args
参数抛出错误的一些混淆可能源于pandas.DataFrame.apply
确实有一个args
参数(一个元组),而pandas.core.groupby.GroupBy.apply
没有。
So, when you call .apply
on a DataFrame itself, you can use this argument; 因此,当您在DataFrame本身上调用
.apply
时,您可以使用此参数; when you call .apply
on a groupby object, you cannot. 当你在groupby对象上调用
.apply
时,你不能。
In @MaxU's answer, the expression lambda x: myFunction(x, arg1)
is passed to func
(the first parameter); 在@ MaxU的答案中,表达式
lambda x: myFunction(x, arg1)
被传递给func
(第一个参数); there is no need to specify additional *args
/ **kwargs
because arg1
is specified in lambda. 没有必要指定额外的
*args
/ **kwargs
因为在lambda中指定了arg1
。
An example: 一个例子:
import numpy as np
import pandas as pd
# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0) # equiv to df.sum(0)
df.apply(np.sum, axis=1) # equiv to df.sum(1)
# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'
For me 为了我
df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))
worked 工作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.