[英]Obtaining predictions from an mgcv::gam fit that contains a matrix “by” variable to a smooth
I just discovered that mgcv::s()
permits one to supply a matrix to its by
argument, permitting one to smooth a continuous variable with separate smooths for each of a combination of variables (and their interactions if so desired). 我刚发现
mgcv::s()
允许人们的矩阵提供给它的by
参数,从而允许一个来平滑带独立润滑肌肤连续变量的每个变量(和它们之间的相互作用,如果需要)的组合构成。 However, I'm having trouble getting sensible predictions from such models, for example: 但是,我很难从此类模型中获得明智的预测,例如:
library(mgcv) #for gam
library(ggplot2) #for plotting
#Generate some fake data
set.seed(1) #for replicability of this example
myData = expand.grid(
var1 = c(-1,1)
, var2 = c(-1,1)
, z = -10:10
)
myData$y = rnorm(nrow(myData)) + (myData$z^2 + myData$z*4) * myData$var1 +
(3*myData$z^2 + myData$z) * myData$var2
#note additive effects of var1 and var2
#plot the data
ggplot(
data = myData
, mapping = aes(
x = z
, y = y
, colour = factor(var1)
, linetype = factor(var2)
)
)+
geom_line(
alpha = .5
)
#reformat to matrices
zMat = matrix(rep(myData$z,times=2),ncol=2)
xMat = matrix(c(myData$var1,myData$var2),ncol=2)
#get the fit
fit = gam(
formula = myData$y ~ s(zMat,by=xMat,k=5)
)
#get the predictions and plot them
predicted = myData
predicted$value = predict(fit)
ggplot(
data = predicted
, mapping = aes(
x = z
, y = value
, colour = factor(var1)
, linetype = factor(var2)
)
)+
geom_line(
alpha = .5
)
Yields this plot of the input data: 产生以下输入数据图:
And this obviously awry plot of the predicted values: 这显然是预测值的错误曲线:
Whereas replacing the gam fit above with: 而将上述gam fit替换为:
fit = gam(
formula = y ~ s(z,by=var1,k=5) + s(z,by=var2,k=5)
, data = myData
)
but otherwise running the same code yields this reasonable plot of predicted values: 但是以其他方式运行相同的代码将得出以下合理的预测值图:
What am I doing wrong here? 我在这里做错了什么?
The use of vector-valued inputs to mgcv smooths is taken up here . 采用矢量值输入mgcv润滑肌肤吸收这里 。 It seems to me that you are misunderstanding these model types.
在我看来,您误解了这些模型类型。
Your first formula 您的第一个配方
myData$y ~ s(zMat,by=xMat,k=5)
fits the model 适合模型
y ~ f(z)*x_1 + f(z)*x_2
That is, mgcv estimates a single smooth function f(). 也就是说,mgcv估计单个平滑函数f()。 This function is evaluated at each covariate, with the weightings supplied to the by argument.
该函数在每个协变量中进行评估,权重提供给by参数。
Your second formula 你的第二个公式
y ~ s(z,by=var1,k=5) + s(z,by=var2,k=5)
fits the model 适合模型
y ~ f_1(z)*x_1 +f_2(z)*x_2
where f_1() and f_2() are two different smooth functions. 其中f_1()和f_2()是两个不同的平滑函数。 Your data model is essentially the second formula, so it is not surprising that it gives a more sensible looking fit.
数据模型本质上是第二个公式,因此毫不奇怪的是它提供了更合理的外观。
The first formula is useful when you want an additive model where a single function is evaluated on each variable, with given weightings. 当您需要一个加法模型时,第一个公式非常有用,该模型以给定的权重对每个变量评估一个函数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.