简体   繁体   English

将数据框作为参数传递给pandas apply

[英]Passing a dataframe as an an argument in apply with pandas

I'm trying to use .apply() with a dataframe as one of the arguments: 我正在尝试将.apply()与数据.apply()一起用作参数之一:

df.apply(func, axis=1, args=(df))

When I do, I get the following error: 当我这样做时,出现以下错误:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here is the function: 这是函数:

def func(df): 
  new_val = df.loc[ \
    (df["date"] == self.date + relativedelta(years=1)) & \
    (df["indicator"] == self.indicator), "val"]
  if (len(new_val) == 1):
    new_val = list(new_val)[0] # Extract integer from series
    self["updated_val"] =  new_val - self.val

Ok, the problem here is the combination of func and apply . 好的,这里的问题是funcapply的组合。 The apply method of a dataframe applies the given function to each COLUMN in the data frame and returns the result. 数据框的apply方法将给定函数应用于数据框中的每个COLUMN并返回结果。 So the function you pass to apply should expect a pandas Series or an array as input, not a dataframe. 因此,传递给apply的函数应该以pandas Series或数组作为输入,而不是数据框。 It should give either a series/array or single value as output. 它应该给出序列/数组或单个值作为输出。 For example 例如

df.apply(sum) 

will apply the sum function to each column and give a series containing the result for each column (normally you would do df.sum() for this but I'm just using it to illustrate the point). sum函数应用于每一列,并给出一个包含每一列结果的序列(通常您会为此执行df.sum(),但我只是用它来说明这一点)。

Secondly, the args parameter in apply is only used when the function you are passing takes additional arguments (besides the series, which should be the first argument). 其次, applyargs参数仅在您传递的函数带有其他参数时使用(除了系列,它应该是第一个参数)。 For example, you might have a function that sums an array and then divides by some number (again a silly example): 例如,您可能有一个将数组求和然后除以某个数字的函数(再次是一个愚蠢的示例):

def sum_div(array, divisor):
    return sum(array) / divisor

The you might want to apply this to each column of a dataframe with divisor = 2. You would do 您可能希望将此值应用于除数= 2的数据框的每一列。

df.apply(sum_div, args=[2])

I'm not sure what you want to here. 我不确定您要在这里做什么。 Is it just func(df) ? 只是func(df)吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM