简体   繁体   中英

Passing a dataframe as an an argument in apply with pandas

I'm trying to use .apply() with a dataframe as one of the arguments:

df.apply(func, axis=1, args=(df))

When I do, I get the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here is the function:

def func(df): 
  new_val = df.loc[ \
    (df["date"] == self.date + relativedelta(years=1)) & \
    (df["indicator"] == self.indicator), "val"]
  if (len(new_val) == 1):
    new_val = list(new_val)[0] # Extract integer from series
    self["updated_val"] =  new_val - self.val

Ok, the problem here is the combination of func and apply . The apply method of a dataframe applies the given function to each COLUMN in the data frame and returns the result. So the function you pass to apply should expect a pandas Series or an array as input, not a dataframe. It should give either a series/array or single value as output. For example

df.apply(sum) 

will apply the sum function to each column and give a series containing the result for each column (normally you would do df.sum() for this but I'm just using it to illustrate the point).

Secondly, the args parameter in apply is only used when the function you are passing takes additional arguments (besides the series, which should be the first argument). For example, you might have a function that sums an array and then divides by some number (again a silly example):

def sum_div(array, divisor):
    return sum(array) / divisor

The you might want to apply this to each column of a dataframe with divisor = 2. You would do

df.apply(sum_div, args=[2])

I'm not sure what you want to here. Is it just func(df) ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM