I am trying to apply a function to each row in a dataframe. The problem is, the function requires output from the previous row as an input.
def emaIrregular(alpha, sample, sampleprime, deltats, emaprime):
a = deltats / float(alpha)
u = math.exp(a * -1)
v = (1 - u) / a
return (u * emaprime) + ((v - u) * prevprime) +((1.0 - v) * sample)
The issue is from the parameter emaprime as this is computing the current ema value. I am aware I can shift the df to get sampleprime and deltats values.
The function I am using is slightly complex: here is a toy example I hope will help.
def myRollingSum(x, xprime):
return x + xprime
So the similar to a rollingsum as it uses the output from the previous iteration as the input for the next.
Edit Ok, myRollingSum example is throwing people off. I need to access the result of the previous row, but this result is the thing being computed! ie . Alternatively, similar to the way a factorial is commutated.
My data is sparse and irregularly spaced. It is not feasible to resample/interpolate and run over this expanded dataset for each window.
I have a feeling there is not an easy way to do this, apart from iterating over each record one by one?
It looks like .rolling_apply would definitely work as behzad.nouri suggested
Another stupider but possibly easier to follow way would be to use .shift(1) to make a shifted column. Then, use numpy function vectorize to call a function using the two columns as inputs.
df['shifted'] = df["x"].shift(1)
def myRollingSum(x, xprime):
return x + xprime
df['rsum'] = np.vectorize(myRollingSum)(df['x'], df['shifted'])
It looks like you want to apply a recursive function. In that case, .rolling_apply won't work. One way would be to use the series values as a list or numpy array. Then loop through the list to use the recursive function.
Your function should be calling itself to look something like this.
def factorial(i, alist):
if i > 0:
print alist[i-1]
return alist[i]*factorial(i-1,alist)
else:
return 1
If you want to do it through the dataframe, you can make a series that contains all the values of the series in a list. Then you make another one that has the index number. Then you can call the factorial function (or whatever you function is) using numpy.vectorize.
df["alldata"] = df["x"].values().tolist()
df = df.reset_index()
#
df["fact"] = numpy.vectorize(factorial)(df["index"], df["alldata"])
I think this solution will execute faster than using iterrows(), but I'm not sure.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.