简体   繁体   中英

mgcv: Error Model has more coefficients than data, related to the argument by in the gam()

In case a, the gam code in mgcv R package is working well.

library(mgcv)
dat <- gamSim(1,n=400,dist="normal",scale=2)

num_knots = nrow(dat)
fit <- gam(y~s(x0, bs = "cr", k = num_knots, m=2),data=dat)
summary(fit)

But after I added the argument by in the gam(), it reported the error "Model has more coefficients than data".

fit <- gam(y~s(x0, bs = "cr", k = num_knots, m=2, by = x1),data=dat)

The error confuses me because I thought adding the by argument to create the interaction between the smoothing term and the parametric term should not increase the number of unknown coefficients, though it turns out that I am wrong. Where was I wrong?

When you pass a continuous variable to by , what you are getting is varying coefficient model where the effect of x1 varies as a smooth function of x0 .

What is happening in the first case is that because of identifiability constraints being applied to the basis expansion for x0 , you requested num_knots basis functions but actually got num_knots - 1 basis functions. When you add the intercept you get num_knots coefficients which is OK to fit with this model as it is a penalised spline (though you probably want method = 'REML' ). The identifiability constraint is applied because there is a basis function (or combination) that is confounded with the model intercept and you can't fit two constant terms in the model and have them be uniquely identified.

In the second case, the varying coefficient model, there isn't the same issue, so when you ask for num_knots basis functions plus an intercept you are trying to fit a model with 401 coefficients with 400 observations which isn't going to work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM