简体   繁体   English

在滚动窗口上的pandas数据框上应用自定义功能

[英]apply custom function on pandas dataframe on a rolling window

Suppose you have a dataframe with 1000 closing prices. 假设您有一个包含1000个收盘价的数据框。 You want to apply a risk calculation function (let's say VaR) named compute_var() on last 90 closing prices, on a rolling basis. 您要滚动应用最后90个收盘价上的名为compute_var()的风险计算函数(假设为VaR)。 How would you do it? 你会怎么做? I presume with apply() : 我认为与apply()

def compute_var(df):
       return do_calculations_on(df[-90:])

def compute_rolling_var(self):
       self.var = self.closing.apply(compute_var)

Problem is that .apply only passes 1 day closing to compute_var, and not a dataframe. 问题是.apply只能关闭1天才能到达compute_var,而不是数据帧。 So it gives an error. 因此它给出了一个错误。

The only working solution I found is with iteration-style algo (.iterrow()): I pass the iteration index to compute_var and it crops the closing dataframe self.closing[:i] before performing calculation on the last 90 rows, then it populates the df.var dataframe via .loc(i) = computer_var_value . 我找到的唯一可行的解​​决方案是使用迭代样式算法(.iterrow()):我将迭代索引传递给compute_var并在对最后90行执行计算之前self.closing[:i]关闭的数据帧self.closing[:i] ,然后通过.loc(i) = computer_var_value填充df.var数据帧。

I suspect there is a better way. 我怀疑有更好的方法。

answer is apply_rolling as underlined by EdChum + min_periods adjustment 答案是apply_rolling,由EdChum + min_periods调整强调

Problem came from a few NaN values in input data, and min_periods=None by default, which reacts as if no NaN value is allowed in your window (90 days here). 问题从几个进来NaN输入数据,和值min_periods=None默认情况下,它的反应就好像没有 NaN值在窗口允许(在这里90天)。 Seems very counter-intuitive to me, but setting min_periods=1 resolved my issue. 对我来说似乎很违反直觉,但是设置min_periods=1解决了我的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM