简体   繁体   English

从mgcv :: gam拟合中获得预测,该预测包含矩阵“ by”变量到平滑

[英]Obtaining predictions from an mgcv::gam fit that contains a matrix “by” variable to a smooth

I just discovered that mgcv::s() permits one to supply a matrix to its by argument, permitting one to smooth a continuous variable with separate smooths for each of a combination of variables (and their interactions if so desired). 我刚发现mgcv::s()允许人们的矩阵提供给它的by参数,从而允许一个来平滑带独立润滑肌肤连续变量的每个变量(和它们之间的相互作用,如果需要)的组合构成。 However, I'm having trouble getting sensible predictions from such models, for example: 但是,我很难从此类模型中获得明智的预测,例如:

library(mgcv) #for gam
library(ggplot2) #for plotting

#Generate some fake data
set.seed(1) #for replicability of this example
myData = expand.grid(
    var1 = c(-1,1)
    , var2 = c(-1,1)
    , z = -10:10
)
myData$y = rnorm(nrow(myData)) + (myData$z^2 + myData$z*4) * myData$var1 + 
                                 (3*myData$z^2 + myData$z) * myData$var2 
    #note additive effects of var1 and var2

#plot the data
ggplot(
    data = myData
    , mapping = aes(
        x = z
        , y = y
        , colour = factor(var1)
        , linetype = factor(var2)
    )
)+
geom_line(
    alpha = .5
)

#reformat to matrices
zMat = matrix(rep(myData$z,times=2),ncol=2)
xMat = matrix(c(myData$var1,myData$var2),ncol=2)

#get the fit
fit = gam(
    formula = myData$y ~ s(zMat,by=xMat,k=5)
)

#get the predictions and plot them
predicted = myData
predicted$value = predict(fit)
ggplot(
    data = predicted
    , mapping = aes(
        x = z
        , y = value
        , colour = factor(var1)
        , linetype = factor(var2)
    )
)+
geom_line(
    alpha = .5
)

Yields this plot of the input data: 产生以下输入数据图:

输入数据图

And this obviously awry plot of the predicted values: 这显然是预测值的错误曲线:

输出数据图

Whereas replacing the gam fit above with: 而将上述gam fit替换为:

fit = gam(
    formula = y ~ s(z,by=var1,k=5) + s(z,by=var2,k=5)
    , data = myData
)

but otherwise running the same code yields this reasonable plot of predicted values: 但是以其他方式运行相同的代码将得出以下合理的预测值图:

其他输出图

What am I doing wrong here? 我在这里做错了什么?

The use of vector-valued inputs to mgcv smooths is taken up here . 采用矢量值输入mgcv润滑肌肤吸收这里 It seems to me that you are misunderstanding these model types. 在我看来,您误解了这些模型类型。

Your first formula 您的第一个配方

myData$y ~ s(zMat,by=xMat,k=5)

fits the model 适合模型

y ~ f(z)*x_1 + f(z)*x_2

That is, mgcv estimates a single smooth function f(). 也就是说,mgcv估计单个平滑函数f()。 This function is evaluated at each covariate, with the weightings supplied to the by argument. 该函数在每个协变量中进行评估,权重提供给by参数。

Your second formula 你的第二个公式

y ~ s(z,by=var1,k=5) + s(z,by=var2,k=5)

fits the model 适合模型

y ~ f_1(z)*x_1 +f_2(z)*x_2

where f_1() and f_2() are two different smooth functions. 其中f_1()和f_2()是两个不同的平滑函数。 Your data model is essentially the second formula, so it is not surprising that it gives a more sensible looking fit. 数据模型本质上是第二个公式,因此毫不奇怪的是它提供了更合理的外观。

The first formula is useful when you want an additive model where a single function is evaluated on each variable, with given weightings. 当您需要一个加法模型时,第一个公式非常有用,该模型以给定的权重对每个变量评估一个函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM