在 Python（Numpy 或 Pandas）中计算滚动（线性）回归的最高效/最快的计算方式

Question

I have a need to do very very fast and efficient way of rolling linear regression.我需要做非常非常快速和有效的滚动线性回归的方法。 I looked through these two threads:我查看了这两个线程：

Efficient way to do a rolling linear regression Rolling linear regression 进行滚动线性回归的有效方法 Rolling linear regression

From them, I had inferred numpy was (computationally) the fastest.从他们那里，我推断 numpy 是（计算上）最快的。 However, using my (limited) python skills, I found the time to compute the same set of rolling data, was *** the same ***.但是，使用我的（有限的）python 技能，我发现计算同一组滚动数据的时间是***相同的***。

Is there a faster way to compute than either of the 3 methods I post below?有没有比我在下面发布的 3 种方法中的任何一种更快的计算方法？ I would have thought the numpy way is much faster, but unfortunately, it wasn't.我本以为 numpy 方式要快得多，但不幸的是，事实并非如此。

########## testing time for pd rolling vs numpy rolling

def fitcurve(x_pts):
    poly = np.polyfit(np.arange(len(x_pts)), x_pts, 1)
    return np.poly1d(poly)[1]


win_ = 30
# tmp_ = data_.Close
tmp_ = pd.Series(np.random.rand(10000))
s_time = time.time()
roll_pd = tmp_.rolling(win_).apply(lambda x: fitcurve(x)).to_numpy()
print('pandas rolling time is', time.time() - s_time)
plt.show()
pd.Series(roll_pd).plot()

########
s_time = time.time()
roll_np = np.empty(0)
for cnt_ in range(len(tmp_)-win_):
    tmp1_ = tmp_[cnt_:cnt_+ win_]
    grad_ = np.linalg.lstsq(np.vstack([np.arange(win_), np.ones(win_)]).T, tmp1_, rcond = None)[0][0]
    roll_np = np.append(roll_np, grad_)

print('numpy rolling time is', time.time() - s_time)
plt.show()
pd.Series(roll_np).plot()

#################
s_time = time.time()
roll_st = np.empty(0)
from scipy import stats
for cnt_ in range(len(tmp_)-win_):
    slope, intercept, r_value, p_value, std_err = stats.linregress(np.arange(win_), tmp_[cnt_:cnt_ + win_])
    roll_st = np.append(roll_st, slope)
print('stats rolling time is', time.time() - s_time)
plt.show()
pd.Series(roll_st).plot()

Answer 1

tl;dr tl;博士

My answer is我的答案是

view = np.lib.stride_tricks.sliding_window_view(tmp_, (win_,))
xxx=np.vstack([np.arange(win_), np.ones(win_)]).T
roll_mat=(np.linalg.inv(xxx.T @ xxx) @ (xxx.T) @ view.T)[0]

And it takes 1.2 ms to compute, compared to 2 seconds for your pandas and numpy version, and 3.5 seconds for your stat version.计算时间为 1.2 毫秒，而 pandas 和 numpy 版本需要 2 秒，统计版本需要 3.5 秒。

Long version长版

One method could be to use sliding_window_view to transform your tmp_ array, into an array of window (a fake one: it is just a view, not really a 10000x30 array of data. It is just tmp_ but viewed differenty. Hence the _view in the function name).一种方法可能是使用sliding_window_view将您的tmp_数组转换为 window 的数组（假的：它只是一个视图，而不是真正的 10000x30 数据数组。它只是tmp_但视图不同。因此_view在function 姓名）。

No direct advantage.没有直接优势。 But then, from there, you can try to take advantage of vectorization.但是，从那里，您可以尝试利用矢量化。

I do that two different way: an easy one, and one that takes a minute of thinking.我用两种不同的方式来做：一种简单的，一种需要花一分钟的时间思考。 Since I put the best answer first, the rest of this message can appear inconsistent chronologically (I say things like "in my previous answer" when the previous answer come later), but I tried to redact both answer consistently.由于我将最佳答案放在第一位，因此此消息的 rest 可能会按时间顺序出现不一致（当上一个答案稍后出现时，我会说“在我之前的答案中”之类的话），但我试图一致地编辑两个答案。

New answer: matrix operations新答案：矩阵运算

One method to do that (since lstsq is of the rare numpy method that wouldn't just do it naturally) is to go back to what lstsq(X,Y) does in reality: it computes (XᵀX)⁻¹Xᵀ Y这样做的一种方法（因为lstsq是罕见的 numpy 方法，不会自然地这样做）是 go 回到lstsq(X,Y)在现实中所做的事情：它计算(XᵀX)⁻¹Xᵀ Y

So let's just do that.所以让我们这样做吧。 In python, with xxx being the X array (of arange and 1 in your example) and view the array of windows to your data (that is view[i] is tmp_[i:i+win_] ), that would be np.linalg.inv(xxx.T@xxx)@xxx.T@view[i] for i being each row.在 python 中， xxx是 X 数组（在您的示例中是 arange 和 1），并将 windows 的数组view到您的数据（即view[i]是tmp_[i:i+win_] ），这将是np.linalg.inv(xxx.T@xxx)@xxx.T@view[i]因为我是每一行。 We could vectorize that operation with np.vectorize to avoid iterating i , as I did for my first solution (see below).我们可以使用np.vectorize对该操作进行矢量化以避免迭代i ，就像我在第一个解决方案中所做的那样（见下文）。 But the thing is, we don't need to.但问题是，我们不需要。 That is just a matrix times a vector.那只是一个矩阵乘以一个向量。 And the operation computing a matrix times a vector for each vector in an array of vectors, is just matrix multiplication!并且为向量数组中的每个向量计算矩阵乘以向量的操作只是矩阵乘法！

Hence my 2nd (and probably final) answer因此我的第二个（也可能是最后一个）答案

view = np.lib.stride_tricks.sliding_window_view(tmp_, (win_,))
xxx=np.vstack([np.arange(win_), np.ones(win_)]).T
roll_mat=(np.linalg.inv(xxx.T @ xxx) @ (xxx.T) @ view.T)[0]

roll_mat is still identical (with one extra row because your roll_np stopped one row short of the last possible one) to roll_np (see below for graphical proof with my first answer. I could provide a new image for this one, but it is indistinguishable from the one I already used). roll_mat仍然与 roll_np 相同（多了一行，因为你的roll_np比最后一行停止了一行）（请参阅下面的图形证明和我的第一个答案。我可以为这个提供一个新图像，但它与我已经用过的那个）。 So same result (unsurprisingly I should say... but sometimes it is still a surprise when things work exactly like theory says they do)如此相同的结果（毫不奇怪我应该说......但有时当事情完全按照理论所说的那样工作时仍然令人惊讶）

But timing, is something else.但时机，是另一回事。 As promised, my previous factor 4 was nothing compared to what real vectorization can do.正如承诺的那样，与真正的矢量化可以做的相比，我之前的因子 4 微不足道。 See updated timing table:请参阅更新的时间表：

Method方法	Time时间
pandas pandas	2.10 s 2.10 秒
numpy roll numpy卷	2.03 s 2.03 秒
stat状态	3.58 s 3.58 秒
numpy view/vectorize (see below) numpy 查看/矢量化（见下文）	0.46 s 0.46 秒
numpy view/matmult numpy 查看/matmult	1.2 ms 1.2 毫秒

The important part is 'ms', compared to other 's'.与其他“s”相比，重要的部分是“ms”。 So, this time factor is 1700 !所以，这个时间因子是 1700 ！

Old-answer: vectorize旧答案：向量化

A lame method, once we have this view could be to use np.vectorize from there.一个蹩脚的方法，一旦我们有了这个view ，就可以从那里使用np.vectorize 。 I call it lame because vectorize is not supposed to be efficient.我称之为lame的，因为vectorize不应该是有效的。 It is just a for loop called by another name.它只是一个用另一个名字调用的 for 循环。 Official documentation clearly says "not to be used for performance".官方文档明确表示“不用于性能”。 And yet, it would be an improvement from your code然而，这将是您代码的改进

view = np.lib.stride_tricks.sliding_window_view(tmp_, (win_,))
xxx=np.vstack([np.arange(win_), np.ones(win_)]).T
f = np.vectorize(lambda y: np.linalg.lstsq(xxx,y,rcond=None)[0][0], signature='(n)->()')
roll_vectorize=f(view)

Firt let's verify the result首先让我们验证结果

plt.scatter(f(view)[:-1], roll_np))

So, obviously, same results as roll_np (which, I've checked the same way, are the same results as the two others. With also the same variation on indexing since all 3 methods have not the same strategy for border)因此，很明显，结果与roll_np相同（我以相同的方式检查过，与其他两个结果相同。索引的变化也相同，因为所有 3 种方法的边界策略都不相同）

And the interesting part, timings:有趣的部分，时间：

Method方法	Time时间
pandas pandas	2.10 s 2.10 秒
numpy roll numpy卷	2.03 s 2.03 秒
stat状态	3.58 s 3.58 秒
numpy view/vectorize numpy 查看/矢量化	0.46 s 0.46 秒

So, you see, it is not supposed to be for performance, and yet, I gain more that x4 times with it.所以，你看，它不应该是为了性能，然而，我用它获得了 x4 倍的收益。

I am pretty sure that a more vectorized method (alas, lstsq doesn't allow directly it, unlike most numpy functions) would be even faster.我很确定一个更矢量化的方法（唉，lstsq 不允许直接使用它，不像大多数 numpy 函数）会更快。

Answer 2

First if you need some tips for optimizing your python code, I believe this playlist might help you.首先，如果您需要一些优化 python 代码的技巧，我相信这个播放列表可能会对您有所帮助。

For making it faster;为了让它更快； "Append" is never a good way, you think of it in terms of memory, every time you append, python may create a completely new list with a bigger size (maybe n+1; where n is old size) and copy the last items (which will be n places) and for the last one will be added at last place. “追加”从来都不是一个好方法，你把它想象成 memory，每次你 append 时，python 可能会创建一个更大尺寸的全新列表（可能是 n+1；其中 n 是旧尺寸）并复制最后一个项目（这将是 n 个地方），最后一个项目将被添加到最后一个地方。

So when I changed it to be as follows所以当我把它改成如下

########## testing time for pd rolling vs numpy rolling

def fitcurve(x_pts):
    poly = np.polyfit(np.arange(len(x_pts)), x_pts, 1)
    return np.poly1d(poly)[1]


win_ = 30
# tmp_ = data_.Close
tmp_ = pd.Series(np.random.rand(10000))
s_time = time.time()
roll_pd = tmp_.rolling(win_).apply(lambda x: fitcurve(x)).to_numpy()
print('pandas rolling time is', time.time() - s_time)
plt.show()
pd.Series(roll_pd).plot()

########
s_time = time.time()
roll_np = np.zeros(len(tmp_)-win_) ### Change
for cnt_ in range(len(tmp_)-win_):
    tmp1_ = tmp_[cnt_:cnt_+ win_]
    grad_ = np.linalg.lstsq(np.vstack([np.arange(win_), np.ones(win_)]).T, tmp1_, rcond = None)[0][0]
    roll_np[cnt_] = grad_ ### Change
    # roll_np = np.append(roll_np, grad_) ### Change

print('numpy rolling time is', time.time() - s_time)
plt.show()
pd.Series(roll_np).plot()

#################
s_time = time.time()
roll_st = np.empty(0)
from scipy import stats
for cnt_ in range(len(tmp_)-win_):
    slope, intercept, r_value, p_value, std_err = stats.linregress(np.arange(win_), tmp_[cnt_:cnt_ + win_])
    roll_st = np.append(roll_st, slope)
print('stats rolling time is', time.time() - s_time)
plt.show()
pd.Series(roll_st).plot()

I initialized the array from first place with the size of how it's expected to turn to be(len(tmp_)-win_ in range), and just assigned values to it later, and it was much faster.我从第一个地方开始初始化数组，其大小为（len（tmp_）-win_ in range），稍后为其分配值，速度更快。

there are also some other tips you can do, Python is interpreted language, meaning each time it takes a line, convert it to machine code, then execute it, and it does that for each line.还有一些其他的技巧你可以做，Python 是解释性语言，意思是每次它需要一行，将其转换为机器代码，然后执行它，它对每一行都这样做。 Meaning if you can do multiple things at one line, meaning they will get converted at one time to machine code, it shall be faster, for example, think of list comprehension.这意味着如果你可以在一行中做多件事，这意味着它们将一次转换为机器代码，它应该更快，例如，考虑列表理解。

在 Python（Numpy 或 Pandas）中计算滚动（线性）回归的最高效/最快的计算方式

问题描述

2 个解决方案

解决方案1
1 已采纳 2023-01-11 14:08:49

tl;dr tl;博士

Long version长版

New answer: matrix operations新答案：矩阵运算

Old-answer: vectorize旧答案：向量化

解决方案2
0 2023-01-11 13:02:47

在 Python（Numpy 或 Pandas）中计算滚动（线性）回归的最高效/最快的计算方式

问题描述

2 个解决方案

解决方案1 1 已采纳 2023-01-11 14:08:49

tl;dr tl;博士

Long version长版

New answer: matrix operations新答案：矩阵运算

Old-answer: vectorize旧答案：向量化

解决方案2 0 2023-01-11 13:02:47

解决方案1
1 已采纳 2023-01-11 14:08:49

解决方案2
0 2023-01-11 13:02:47