[英]Custom Link function works for GLM but not mgcv GAM
Apologies if the answer is obvious but I've spent quite some time trying to use a custom link function in mgcv.gam 抱歉,如果答案很明显,但我花了很多时间尝试在mgcv.gam中使用自定义链接功能
In short, 简而言之,
custom_link
) 我想使用包psyphy中的修改后的probit链接(我想使用psyphy.probit_2asym ,我称之为custom_link
) I can create a {stats}family object with this link and use it in the 'family' argument of glm. 我可以使用此链接创建一个{stats}系列对象,并在glm的“family”参数中使用它。
m <- glm(y~x, family=binomial(link=custom_link), ... )
It does not work when used as an argument for {mgcv}gam 当用作{mgcv} gam的参数时,它不起作用
m <- gam(y~s(x), family=binomial(link=custom_link), ... )
I get the error Error in fix.family.link.family(family) : link not recognised
我Error in fix.family.link.family(family) : link not recognised
收到错误Error in fix.family.link.family(family) : link not recognised
I do not get the reason for this error, both glm and gam work if I specify the standard link=probit
. 我没有得到这个错误的原因,如果我指定标准link=probit
,glm和gam都会工作。
So my question can be summarized as: 所以我的问题可归纳为:
what is missing in this custom link that works for glm but not for gam? 这个自定义链接中缺少哪些适用于glm但不适用于gam?
Thanks in advance if you can give me a hint on what I should do. 如果你能给我一些关于我该做什么的提示,请提前致谢。
Link function 链接功能
probit.2asym <- function(g, lam) {
if ((g < 0 ) || (g > 1))
stop("g must in (0, 1)")
if ((lam < 0) || (lam > 1))
stop("lam outside (0, 1)")
linkfun <- function(mu) {
mu <- pmin(mu, 1 - (lam + .Machine$double.eps))
mu <- pmax(mu, g + .Machine$double.eps)
qnorm((mu - g)/(1 - g - lam))
}
linkinv <- function(eta) {
g + (1 - g - lam) *
pnorm(eta)
}
mu.eta <- function(eta) {
(1 - g - lam) * dnorm(eta) }
valideta <- function(eta) TRUE
link <- paste("probit.2asym(", g, ", ", lam, ")", sep = "")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta, name = link),
class = "link-glm")
}
As you may know, glm
takes iteratively reweighted least squares fitting iterations. 如您所知, glm
采用迭代重加权最小二乘拟合迭代。 Early version of gam
extends this by fitting an iteratively penalized reweighted least squares , which is done by the gam.fit
function. 早期版本的gam
通过拟合迭代惩罚的重加权最小二乘来扩展这一点,这是由gam.fit
函数完成的。 This is known as performance iteration in some context. 这在某些上下文中称为性能迭代 。
Since 2008 (or maybe slightly even earlier), gam.fit3
based on what is called outer iteration has replaced gam.fit
as gam
default. 自2008年以来(或者略微甚至更早), gam.fit3
基于所谓外迭代已经取代gam.fit
为gam
默认。 Such change does require some extra information of the family, regarding which you can read about ?fix.family.link
. 这种变化确实需要一些关于家庭的额外信息,您可以阅读这些信息?fix.family.link
。
The major difference between two iterations is whether iteration of coefficients beta
and iteration of smoothing parameters lambda
are nested. 两次迭代之间的主要差异是系数beta
迭代和平滑参数lambda
迭代是否嵌套。
beta
, a single iteration of lambda
is performed; 性能迭代采用嵌套方式,每次更新beta
,执行单次lambda
迭代; beta
, iteration of lambda
is carried to the end till convergence. 外部迭代完全分离了这两个迭代,其中对于beta
每次更新, lambda
迭代被带到最后直到收敛。 Obviously outer iteration is more stable and less likely to suffer from failure of convergence. 显然,外迭代更稳定,并且不太可能遭受收敛失败。
gam
has an argument optimizer
. gam
有一个参数optimizer
。 By default it takes optimizer = c("outer", "newton")
, that is the newton method of outer iteration; 默认情况下,它需要optimizer = c("outer", "newton")
,这是外部迭代的牛顿方法; but if you set optimizer = "perf"
, it will take performance iteration. 但如果你设置optimizer = "perf"
,它将需要性能迭代。
So, after the above overview, we have two options: 因此,在上述概述之后,我们有两个选择:
glm
. 使用性能迭代来保持与glm
。 I am being lazy so will demonstrate the second one (actually I am not feeling too confident to take the first approach) . 我很懒,所以会展示第二个(实际上我对第一种方法感觉不太自信) 。
Reproducible Example 可重复的例子
You did not provide a reproducible example, so I prepare one as below. 您没有提供可重复的示例,因此我准备如下。
set.seed(0)
x <- sort(runif(500, 0, 1)) ## covariates (sorted to make plotting easier)
eta <- -4 + 3 * x * exp(x) - 2 * log(x) * sqrt(x) ## true linear predictor
p <- binomial(link = "logit")$linkinv(eta) ## true probability (response)
y <- rbinom(500, 1, p) ## binary observations
table(y) ## a quick check that data are not skewed
# 0 1
#271 229
I will take g = 0.1
and lam = 0.1
of the function probit.2asym
you intend to use: 我将使用你想要使用的函数probit.2asym
g = 0.1
和lam = 0.1
:
probit2 <- probit.2asym(0.1, 0.1)
par(mfrow = c(1,3))
## fit a glm with logit link
glm_logit <- glm(y ~ x, family = binomial(link = "logit"))
plot(x, eta, type = "l", main = "glm with logit link")
lines(x, glm_logit$linear.predictors, col = 2)
## glm with probit.2asym
glm_probit2 <- glm(y ~ x, family = binomial(link = probit2))
plot(x, eta, type = "l", main = "glm with probit2")
lines(x, glm_probit2$linear.predictors, col = 2)
## gam with probit.2aysm
library(mgcv)
gam_probit2 <- gam(y ~ s(x, bs = 'cr', k = 3), family = binomial(link = probit2),
optimizer = "perf")
plot(x, eta, type = "l", main = "gam with probit2")
lines(x, gam_probit2$linear.predictors, col = 2)
I have used natural cubic spline basis cr
for s(x)
, as for univariate smooth the default setting with thin-plate spline is unnecessary. 我使用s(x)
自然三次样条基础cr
,对于单变量平滑,不需要使用薄板样条的默认设置。 I have also set a small basis dimension k = 3
(can't be smaller for a cubic spline) as my toy data is near linear and big basis dimension is not needed. 我还设置了一个小的基础维度k = 3
(对于三次样条曲线不能更小),因为我的玩具数据接近线性并且不需要大的基础尺寸。 More importantly, this seems to prevent convergence failure of performance iteration for my toy dataset. 更重要的是,这似乎可以防止我的玩具数据集的性能迭代收敛失败。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.