[英]GAM error using bam() with family = betar
我无法解决从mgcv
运行bam()
时遇到的错误。
我注意到 14 个月前在这里报告了一个类似的错误,似乎没有就解决方案达成一致 - 建议向 Simon Wood 发送电子邮件。
我的数据在这里。 数据集太大,无法粘贴dput()
的输出。
如果我使用整个数据集运行以下模型,则会出现以下错误
library(mgcv)
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error,
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1, :
step failure in theta estimation
2: In wt * LS :
longer object length is not a multiple of shorter object length
3: In muth * (log(y) - log1p(-y)) :
longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
6: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
7: In prior. weights * y :
longer object length is not a multiple of shorter object length
8: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - :
longer object length is not a multiple of shorter object length
但是,如果我使用整个数据集运行相同的模型,但我排除了最后一行,则模型似乎运行正常
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error[1:20500,],
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
这向我表明数据集的最后一行有问题。 但是,我在数据集的最后一行中看不到任何我希望产生上述警告消息的错误。
如果我再次在一小部分数据上运行相同的模型,但这次包括最后一行数据,则模型再次运行正常。
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error[20400:20501,],
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
但是更大的数据子集(同样包括最后一行)会产生与上述类似的警告消息。
m3 <- bam(pt10 ~
org.type +
region +
s(year) +
s(year, by = org.type) +
s(year, by = region),
data = error[10000:20501,],
method = "fREML",
family = betar(link="logit", eps = 0.1),
select = T)
Warning messages:
1: In wt * LS :
longer object length is not a multiple of shorter object length
2: In muth * (log(y) - log1p(-y)) :
longer object length is not a multiple of shorter object length
3: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth * :
longer object length is not a multiple of shorter object length
6: In prior.weights * y :
longer object length is not a multiple of shorter object length
7: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - :
longer object length is not a multiple of shorter object length
8: In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method, :
algorithm did not converge
任何建议表示赞赏。
我怀疑问题出在您的eps
上(这可能确实表明您的数据有问题)。
默认值为:
r$> .Machine$double.eps*100
[1] 2.220446e-14
因此,您将所有响应值截断为区间[eps, 1-eps]
(即任何y < eps
或y > 1-eps
分别被重置为eps
和1 - eps
。)。 我想这会导致拟合算法出现问题,并且遇到了未预料到的情况。 如果在[eps, 1-eps]
范围之外的值的数量不少,那么您会将所有这些值堆积在范围的范围内,我怀疑这会导致数据发生细微变化的情况导致拟合算法中的数值问题。
尽可能多地截断数据表明这不是您的数据的正确分布。 如果没有任何其他信息,我会在别处寻找更合适的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.