使用局部加权回归 (LOESS/LOWESS) 预测新数据

Question

如何在 python 中拟合局部加权回归，以便它可以用于预测新数据？

有statsmodels.nonparametric.smoothers_lowess.lowess ，但它只返回原始数据集的估计值； 所以它似乎只fit和predict在一起，而不是像我预期的那样分开。

scikit-learn总是有一个fit方法，该方法允许该对象稍后在新数据上使用predict ； 但它没有实现lowess 。

Answer 1

Lowess 非常适合预测（与插值结合使用时）！ 我认为代码非常简单——如果您有任何问题，请告诉我！ Matplolib 图

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.interpolate import interp1d
import statsmodels.api as sm

# introduce some floats in our x-values
x = list(range(3, 33)) + [3.2, 6.2]
y = [1,2,1,2,1,1,3,4,5,4,5,6,5,6,7,8,9,10,11,11,12,11,11,10,12,11,11,10,9,8,2,13]

# lowess will return our "smoothed" data with a y value for at every x-value
lowess = sm.nonparametric.lowess(y, x, frac=.3)

# unpack the lowess smoothed points to their values
lowess_x = list(zip(*lowess))[0]
lowess_y = list(zip(*lowess))[1]

# run scipy's interpolation. There is also extrapolation I believe
f = interp1d(lowess_x, lowess_y, bounds_error=False)

xnew = [i/10. for i in range(400)]

# this this generate y values for our xvalues by our interpolator
# it will MISS values outsite of the x window (less than 3, greater than 33)
# There might be a better approach, but you can run a for loop
#and if the value is out of the range, use f(min(lowess_x)) or f(max(lowess_x))
ynew = f(xnew)


plt.plot(x, y, 'o')
plt.plot(lowess_x, lowess_y, '*')
plt.plot(xnew, ynew, '-')
plt.show()

Answer 2

我创建了一个名为moepy的模块，它为 LOWESS 模型（包括拟合/预测）提供了一个类似于 sklearn 的 API。 这使得可以使用底层的局部回归模型进行预测，而不是其他答案中描述的插值方法。 下面显示了一个极简示例。

# Imports
import numpy as np
import matplotlib.pyplot as plt
from moepy import lowess

# Data generation
x = np.linspace(0, 5, num=150)
y = np.sin(x) + (np.random.normal(size=len(x)))/10

# Model fitting
lowess_model = lowess.Lowess()
lowess_model.fit(x, y)

# Model prediction
x_pred = np.linspace(0, 5, 26)
y_pred = lowess_model.predict(x_pred)

# Plotting
plt.plot(x_pred, y_pred, '--', label='LOWESS', color='k', zorder=3)
plt.scatter(x, y, label='Noisy Sin Wave', color='C1', s=5, zorder=1)
plt.legend(frameon=False)

可以在此处找到有关如何使用模型（以及其置信度和预测区间变体）的更详细指南。

Answer 3

考虑改用内核回归。

statmodels 有一个实现。

如果数据点太多，为什么不使用 sk.learn 的radiusNeighborRegression并指定一个 tricube 加权函数？

Answer 4

目前尚不清楚拥有一个专用的 LOESS 对象是否是一个好主意，该对象具有单独的拟合/预测方法，例如 Scikit-Learn 中常见的方法。 相比之下，对于神经网络，您可以拥有一个仅存储相对较小的权重集的对象。 然后，拟合方法将通过使用非常大的训练数据集来优化“少数”权重。 predict 方法只需要权重来进行新的预测，而不需要整个训练集。

另一方面，基于 LOESS 和最近邻的预测需要整个训练集才能做出新的预测。 fit 方法唯一能做的就是将训练集存储在对象中以备后用。 如果x和y是训练数据，并且x0是进行新预测的点，那么这个面向对象的拟合/预测解决方案将如下所示：

model = Loess()
model.fit(x, y)         # No calculations. Just store x and y in model.
y0 = model.predict(x0)  # Uses x and y just stored.

相比之下，在我的localreg库中，我选择了简单：

y0 = localreg(x, y, x0)

这真的归结为设计选择，因为性能是一样的。 拟合/预测方法的一个优点是您可以像在 Scikit-Learn 中那样拥有一个统一的界面，其中一个模型可以很容易地被另一个模型交换。 拟合/预测方法还鼓励使用机器学习的方式来思考它，但从这个意义上说，LOESS 效率不是很高，因为它需要为每个新预测存储和使用所有数据。 后一种方法更倾向于作为散点图平滑算法的 LOESS 的起源，这是我更愿意考虑的方式。 这也可能有助于解释为什么 statsmodel 会以他们的方式进行操作。

Answer 5

查看scikit-misc中的loess类。 拟合对象有一个 predict 方法：

loess_fit = loess(x, y, span=.01);
loess_fit.fit();
preds = loess_fit.predict(x_new).values

https://has2k1.github.io/scikit-misc/stable/generated/skmisc.loess.loess.html

Answer 6

如何在python中拟合局部加权回归，以便可以将其用于预测新数据？

有statsmodels.nonparametric.smoothers_lowess.lowess ，但它仅返回原始数据集的估计值； 因此它似乎只能fit在一起进行predict ，而不是像我预期的那样单独进行预测。

scikit-learn始终有一个fit方法，该方法允许对象稍后在带有predict新数据上使用； 但它并没有实现lowess 。

使用局部加权回归 (LOESS/LOWESS) 预测新数据

问题描述

5 个解决方案

解决方案1
16 2016-05-05 21:58:46

解决方案2
5 2021-05-26 23:39:08

解决方案3
4 2017-07-05 00:34:02

解决方案4
0 2021-09-20 12:36:19

解决方案5
0 2022-06-29 19:57:24

解决方案6
-4 2019-09-20 15:38:52

使用局部加权回归 (LOESS/LOWESS) 预测新数据

问题描述

5 个解决方案

解决方案1 16 2016-05-05 21:58:46

解决方案2 5 2021-05-26 23:39:08

解决方案3 4 2017-07-05 00:34:02

解决方案4 0 2021-09-20 12:36:19

解决方案5 0 2022-06-29 19:57:24

解决方案6 -4 2019-09-20 15:38:52

解决方案1
16 2016-05-05 21:58:46

解决方案2
5 2021-05-26 23:39:08

解决方案3
4 2017-07-05 00:34:02

解决方案4
0 2021-09-20 12:36:19

解决方案5
0 2022-06-29 19:57:24

解决方案6
-4 2019-09-20 15:38:52