简体   繁体   English

使用Pandas groupby()+ apply()和参数

[英]Use Pandas groupby() + apply() with arguments

I would like to use df.groupby() in combination with apply() to apply a function to each row per group. 我想将df.groupby()apply()结合使用,将函数应用于每个组的每一行。

I normally use the following code, which usually works (note, that this is without groupby() ): 我通常使用以下代码,通常可以工作(注意,这是没有groupby() ):

df.apply(myFunction, args=(arg1,))

With the groupby() I tried the following: 使用groupby()我尝试了以下内容:

df.groupby('columnName').apply(myFunction, args=(arg1,))

However, I get the following error: 但是,我收到以下错误:

TypeError: myFunction() got an unexpected keyword argument 'args' TypeError:myFunction()得到一个意外的关键字参数'args'

Hence, my question is: How can I use groupby() and apply() with a function that needs arguments? 因此,我的问题是: 我如何使用groupby()apply()与需要参数的函数?

pandas.core.groupby.GroupBy.apply does NOT have named parameter args , but pandas.DataFrame.apply does have it. pandas.core.groupby.GroupBy.apply没有命名参数args ,但pandas.DataFrame.apply确实拥有它。

So try this: 试试这个:

df.groupby('columnName').apply(lambda x: myFunction(x, arg1))

or as suggested by @Zero : 或者根据@Zero的建议:

df.groupby('columnName').apply(myFunction, ('arg1'))

Demo: 演示:

In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list('abc'))

In [83]: df
Out[83]:
   a  b  c
0  0  3  1
1  0  3  4
2  3  0  4
3  4  2  3
4  3  4  1

In [84]: def f(ser, n):
    ...:     return ser.max() * n
    ...:

In [85]: df.apply(f, args=(10,))
Out[85]:
a    40
b    40
c    40
dtype: int64

when using GroupBy.apply you can pass either a named arguments: 使用GroupBy.apply您可以传递命名参数:

In [86]: df.groupby('a').apply(f, n=10)
Out[86]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

a tuple of arguments: 一个参数元组:

In [87]: df.groupby('a').apply(f, (10))
Out[87]:
    a   b   c
a
0   0  30  40
3  30  40  40
4  40  20  30

Some confusion here over why using an args parameter throws an error might stem from the fact that pandas.DataFrame.apply does have an args parameter (a tuple), while pandas.core.groupby.GroupBy.apply does not. 关于为什么使用args参数抛出错误的一些混淆可能源于pandas.DataFrame.apply确实有一个args参数(一个元组),而pandas.core.groupby.GroupBy.apply没有。

So, when you call .apply on a DataFrame itself, you can use this argument; 因此,当您在DataFrame本身上调用.apply时,您可以使用此参数; when you call .apply on a groupby object, you cannot. 当你在groupby对象上调用.apply时,你不能。

In @MaxU's answer, the expression lambda x: myFunction(x, arg1) is passed to func (the first parameter); 在@ MaxU的答案中,表达式lambda x: myFunction(x, arg1)被传递给func (第一个参数); there is no need to specify additional *args / **kwargs because arg1 is specified in lambda. 没有必要指定额外的*args / **kwargs因为在lambda中指定了arg1

An example: 一个例子:

import numpy as np
import pandas as pd

# Called on DataFrame - `args` is a 1-tuple
# `0` / `1` are just the axis arguments to np.sum
df.apply(np.sum, axis=0)  # equiv to df.sum(0)
df.apply(np.sum, axis=1)  # equiv to df.sum(1)


# Called on groupby object of the DataFrame - will throw TypeError
print(df.groupby('col1').apply(np.sum, args=(0,)))
# TypeError: sum() got an unexpected keyword argument 'args'

For me 为了我

df2 = df.groupby('columnName').apply(lambda x: my_function(x, arg1, arg2,))

worked 工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM