[英]How to predict gam model with random effect in R?
I am working on predicting gam
model with random effect to produce 3D surface plot by plot_ly
.我正在通过 plot_ly 预测具有随机效应的
gam
model 以产生 3D 表面plot_ly
。
Here is my code;这是我的代码;
x <- runif(100)
y <- runif(100)
z <- x^2 + y + rnorm(100)
r <- rep(1,times=100) # random effect
r[51:100] <- 2 # replace 1 into 2, making two groups
df <- data.frame(x, y, z, r)
gam_fit <- gam(z ~ s(x) + s(y) + s(r,bs="re"), data = df) # fit
#create matrix data for `add_surface` function in `plot_ly`
newx <- seq(0, 1, len=20)
newy <- seq(0, 1, len=30)
newxy <- expand.grid(x = newx, y = newy)
z <- matrix(predict(gam_fit, newdata = newxy), 20, 30) # predict data as matrix
However, the last line results in error;但是,最后一行导致错误;
Error in model.frame.default(ff, data = newdata, na.action = na.act) :
variable lengths differ (found for 'r')
In addition: Warning message:
In predict.gam(gam_fit, newdata = newxy) :
not all required variables have been supplied in newdata!
Thanks to the previous answer, I am sure that above codes work without random effect, as in here .感谢前面的回答,我确信上面的代码没有随机效应,就像在这里一样。
How can I predict gam models with random effect?如何预测具有随机效应的游戏模型?
Assuming you want the surface conditional upon the random effects (but not for a specific level of the random effect), there are two ways.假设您希望表面以随机效应为条件(但不是随机效应的特定级别),有两种方法。
The first is to provide a level for the random effect but exclude that term from the predicted values using the exclude
argument to predict.gam()
.第一个是为随机效应提供一个级别,但使用
predict.gam()
的exclude
参数从预测值中排除该术语。 The second is to again use exclude
but this time to not provide any data for the random effect and instead stop predict.gam()
from checking the newdata
using the argument newdata.guaranteed = TRUE
.第二个是再次使用
exclude
但这次不为随机效应提供任何数据,而是停止newdata
predict.gam()
使用参数newdata.guaranteed = TRUE
检查新数据。
newxy1 <- with(df, expand.grid(x = newx, y = newy, r = 2))
z1 <- predict(gam_fit, newdata = newxy1, exclude = 's(r)')
z1 <- matrix(z1, 20, 30)
z2 <- predict(gam_fit, newdata = newxy, exclude = 's(r)',
newdata.guaranteed=TRUE)
z2 <- matrix(z2, 20, 30)
These produce the same result:这些产生相同的结果:
> all.equal(z1, z2)
[1] TRUE
A couple of notes:几点注意事项:
Which you use will depend on how complex the rest of you model is.您使用哪种取决于 model 的 rest 的复杂程度。 I would generally use the first option as it provides an extra check against me doing something stupid when creating the data.
我通常会使用第一个选项,因为它提供了一个额外的检查,以防止我在创建数据时做一些愚蠢的事情。 But in this instance, with a simple model and set of covariates it seems safe enough to trust that
newdata
is OK.但在这种情况下,使用简单的 model 和一组协变量似乎足够安全,可以相信
newdata
是可以的。
Your example uses a random slope (was that intended?), not a random intercept as r
is not a factor.您的示例使用随机斜率(这是有意的吗?),而不是随机截距,因为
r
不是一个因素。 If your real example uses a factor random effect then you'll need to be a little more careful when creating the newdata
as you need to get the levels
of the factor right.如果您的真实示例使用因子随机效应,那么您在创建新数据时需要更加
newdata
,因为您需要获得正确的因子levels
。 For example:例如:
expand.grid(x = newx, y = newy, r = with(df, factor(2, levels = levels(r))))
should get the right set-up for a factor r
应该为系数
r
获得正确的设置
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.