简体   繁体   中英

Pandas.apply with dependency on previous value (not shift)

I am trying to apply a function to each row in a dataframe. The problem is, the function requires output from the previous row as an input.

Wanting to use this function

def emaIrregular(alpha, sample, sampleprime, deltats, emaprime):
  a = deltats / float(alpha)
  u = math.exp(a * -1)
  v = (1 - u) / a

  return (u * emaprime) + ((v - u) * prevprime) +((1.0 - v) * sample)

The issue is from the parameter emaprime as this is computing the current ema value. I am aware I can shift the df to get sampleprime and deltats values.

The function I am using is slightly complex: here is a toy example I hope will help.

def myRollingSum(x, xprime):
  return x + xprime

So the similar to a rollingsum as it uses the output from the previous iteration as the input for the next.


Edit Ok, myRollingSum example is throwing people off. I need to access the result of the previous row, but this result is the thing being computed! ie f(x_i)= f(x_i-1)+ c . Alternatively, similar to the way a factorial is commutated.

My data is sparse and irregularly spaced. It is not feasible to resample/interpolate and run over this expanded dataset for each window.

I have a feeling there is not an easy way to do this, apart from iterating over each record one by one?

It looks like .rolling_apply would definitely work as behzad.nouri suggested

Another stupider but possibly easier to follow way would be to use .shift(1) to make a shifted column. Then, use numpy function vectorize to call a function using the two columns as inputs.

df['shifted'] = df["x"].shift(1)
def myRollingSum(x, xprime):
  return x + xprime
df['rsum'] = np.vectorize(myRollingSum)(df['x'], df['shifted'])

It looks like you want to apply a recursive function. In that case, .rolling_apply won't work. One way would be to use the series values as a list or numpy array. Then loop through the list to use the recursive function.

Your function should be calling itself to look something like this.

def factorial(i, alist):
    if i > 0:
        print alist[i-1]
        return alist[i]*factorial(i-1,alist)
    else:
        return 1

If you want to do it through the dataframe, you can make a series that contains all the values of the series in a list. Then you make another one that has the index number. Then you can call the factorial function (or whatever you function is) using numpy.vectorize.

df["alldata"] = df["x"].values().tolist()
df = df.reset_index()
# 
df["fact"] = numpy.vectorize(factorial)(df["index"], df["alldata"])

I think this solution will execute faster than using iterrows(), but I'm not sure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM