简体   繁体   English

如何在 R 中预测具有随机效应的 gam model?

[英]How to predict gam model with random effect in R?

I am working on predicting gam model with random effect to produce 3D surface plot by plot_ly .我正在通过 plot_ly 预测具有随机效应的gam model 以产生 3D 表面plot_ly

Here is my code;这是我的代码;

x <- runif(100)
y <- runif(100)
z <- x^2 + y + rnorm(100)
r <- rep(1,times=100) # random effect
r[51:100] <- 2 # replace 1 into 2, making two groups
df <- data.frame(x, y, z, r)

gam_fit <- gam(z ~ s(x) + s(y) + s(r,bs="re"), data = df) # fit

#create matrix data for `add_surface` function in `plot_ly`
newx <- seq(0, 1, len=20)
newy <- seq(0, 1, len=30)
newxy <- expand.grid(x = newx, y = newy)
z <- matrix(predict(gam_fit, newdata = newxy), 20, 30) # predict data as matrix

However, the last line results in error;但是,最后一行导致错误;

Error in model.frame.default(ff, data = newdata, na.action = na.act) : 
   variable lengths differ (found for 'r')
In addition: Warning message:
In predict.gam(gam_fit, newdata = newxy) :
  not all required variables have been supplied in  newdata!

Thanks to the previous answer, I am sure that above codes work without random effect, as in here .感谢前面的回答,我确信上面的代码没有随机效应,就像在这里一样。

How can I predict gam models with random effect?如何预测具有随机效应的游戏模型?

Assuming you want the surface conditional upon the random effects (but not for a specific level of the random effect), there are two ways.假设您希望表面以随机效应为条件(但不是随机效应的特定级别),有两种方法。

The first is to provide a level for the random effect but exclude that term from the predicted values using the exclude argument to predict.gam() .第一个是为随机效应提供一个级别,但使用predict.gam()exclude参数从预测值中排除该术语。 The second is to again use exclude but this time to not provide any data for the random effect and instead stop predict.gam() from checking the newdata using the argument newdata.guaranteed = TRUE .第二个是再次使用exclude但这次不为随机效应提供任何数据,而是停止newdata predict.gam()使用参数newdata.guaranteed = TRUE检查新数据。

Option 1:选项1:

newxy1 <- with(df, expand.grid(x = newx, y = newy, r = 2))
z1 <- predict(gam_fit, newdata = newxy1, exclude = 's(r)')
z1 <- matrix(z1, 20, 30)

Option 2:选项 2:

z2 <- predict(gam_fit, newdata = newxy, exclude = 's(r)',
              newdata.guaranteed=TRUE)
z2 <- matrix(z2, 20, 30)

These produce the same result:这些产生相同的结果:

> all.equal(z1, z2)
[1] TRUE

A couple of notes:几点注意事项:

  1. Which you use will depend on how complex the rest of you model is.您使用哪种取决于 model 的 rest 的复杂程度。 I would generally use the first option as it provides an extra check against me doing something stupid when creating the data.我通常会使用第一个选项,因为它提供了一个额外的检查,以防止我在创建数据时做一些愚蠢的事情。 But in this instance, with a simple model and set of covariates it seems safe enough to trust that newdata is OK.但在这种情况下,使用简单的 model 和一组协变量似乎足够安全,可以相信newdata是可以的。

  2. Your example uses a random slope (was that intended?), not a random intercept as r is not a factor.您的示例使用随机斜率(这是有意的吗?),而不是随机截距,因为r不是一个因素。 If your real example uses a factor random effect then you'll need to be a little more careful when creating the newdata as you need to get the levels of the factor right.如果您的真实示例使用因子随机效应,那么您在创建新数据时需要更加newdata ,因为您需要获得正确的因子levels For example:例如:

     expand.grid(x = newx, y = newy, r = with(df, factor(2, levels = levels(r))))

    should get the right set-up for a factor r应该为系数r获得正确的设置

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM