广义加性模型-Python

Question

I'm trying to fit a non linear model using Generalized Additive model. 我正在尝试使用广义加性模型拟合非线性模型。 How do I determine the number of splines to use. 如何确定要使用的样条线数量。 Is there a specific way to choose the number of splines? 有没有选择花键数量的特定方法？ I have used a 3rd order (cubic) spline fitting. 我使用了三阶（三次）样条拟合。 Below is the code. 下面是代码。

from pygam import LinearGAM
from pygam.utils import generate_X_grid

# Curve fitting using GAM model - Penalised spline curve.
def modeltrain(time,value):
    return LinearGAM(n_splines=58,spline_order=3).gridsearch(time, value)

model=modeltrain(t1,x1)

# samples random x-values for prediction
XX = generate_X_grid(model)

#plots for vizualisation
plt.plot(XX, model.predict(XX), 'r--')
plt.plot(XX, model.prediction_intervals(XX,width=0.25), color='b', ls='--    ')
plt.scatter(t1, x1)
plt.show()

This is the expected result 这是预期的结果

在此处输入图片说明

Original data scatter plot 原始数据散点图

在此处输入图片说明

If the number of splines is not chosen correctly, then I get a incorrect fit. 如果花键的数量选择不正确，则拟合不正确。

Please, I would like a suggestion of methods to choose the number of splines accurately. 请，我想建议一种方法，可以准确选择花键的数量。

Answer 1

Typically for splines you choose a fairly high number of splines (~25) and you let the lambda smoothing parameter do the work of reducing the flexibility of the model. 通常，对于样条曲线，您会选择相当多的样条曲线（〜25），并让lambda平滑参数完成降低模型灵活性的工作。

For your use-case I would choose the default n_splines=25 and then do a gridsearch over the lambda parameter lam to find the best amount of smoothing: 对于您的用例，我将选择默认的n_splines=25 ，然后对lambda参数lam进行gridsearch以找到最佳的平滑度：

def modeltrain(time,value):
    return LinearGAM(n_splines=25,spline_order=3).gridsearch(time, value, lam=np.logspace(-3, 3, 11))

This will try 11 models from lam = 1e-3 to 1e3 . 这将从lam = 1e-3到1e3尝试11种模型。

I think your choice of n_splines=58 is too high because it looks like it produces one spline per data-point. 我认为您对n_splines=58选择太高了，因为它看起来像每个数据点产生一条样条。

If you really want to do a search over n_splines then you could do: 如果您真的想在n_splines进行搜索，则可以执行以下操作：

LinearGAM(n_splines=25,spline_order=3).gridsearch(time, value, n_splines=np.arange(50))

Note: the function generate_X_grid does NOT do random sampling for prediction, it actually just makes a dense linear-spacing of your X-values (time). 注意：函数generate_X_grid不会对预测进行随机抽样，实际上它只是对您的X值（时间）进行密集的线性间隔。 The reason for this is to visualize how the learned model will interpolate. 这样做的原因是可视化学习的模型将如何插值。

广义加性模型-Python

问题描述

This is the expected result 这是预期的结果

Original data scatter plot 原始数据散点图

1 个解决方案

解决方案1
4 2017-11-27 21:36:52

广义加性模型-Python

问题描述

This is the expected result 这是预期的结果

Original data scatter plot 原始数据散点图

1 个解决方案

解决方案1 4 2017-11-27 21:36:52

解决方案1
4 2017-11-27 21:36:52