[英]zero-inflated overdispersed count data glmmTMB error in R
I am working with count data (available here ) that are zero-inflated and overdispersed and has random effects.我正在使用零膨胀和过度分散且具有随机效应的计数数据(可在此处获得)。 The package best suited to work with this sort of data is the
glmmTMB
(details here and troubleshootinghere ).最适合处理此类数据的 package 是
glmmTMB
(此处为详细信息,此处为故障排除)。
Before working with the data, I inspected it for normality (it is zero-inflated), homogeneity of variance, correlations, and outliers.在处理数据之前,我检查了它的正态性(它是零膨胀的)、方差同质性、相关性和异常值。 The data had two outliers, which I removed from the dataset linekd above.
数据有两个异常值,我从上面的数据集 linekd 中删除了它们。 There are 351 observations from 18 locations (
prop_id
).来自 18 个位置 (
prop_id
) 的 351 个观测值。
The data looks like this:数据如下所示:
euc0 ea_grass ep_grass np_grass np_other_grass month year precip season prop_id quad
3 5.7 0.0 16.7 4.0 7 2006 526 Winter Barlow 1
0 6.7 0.0 28.3 0.0 7 2006 525 Winter Barlow 2
0 2.3 0.0 3.3 0.0 7 2006 524 Winter Barlow 3
0 1.7 0.0 13.3 0.0 7 2006 845 Winter Blaber 4
0 5.7 0.0 45.0 0.0 7 2006 817 Winter Blaber 5
0 11.7 1.7 46.7 0.0 7 2006 607 Winter DClark 3
The response variable is euc0
and the random effects are prop_id
and quad_id
.响应变量是
euc0
,随机效应是prop_id
和quad_id
。 The rest of the variables are fixed effects (all representing the percent cover of different plant species).变量的 rest 是固定效应(都代表不同植物物种的覆盖百分比)。
The model I want to run:我要运行的 model:
library(glmmTMB)
seed0<-glmmTMB(euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + month + year*precip + season*precip + (1|prop_id) + (1|quad), data = euc, family=poisson(link=identity))
fit_zinbinom <- update(seed0,family=nbinom2) #allow variance increases quadratically
The error I get after running the seed0
code is:运行
seed0
代码后我得到的错误是:
Error in optimHess(par.fixed, obj$fn, obj$gr): gradient in optim evaluated to length 1 not 15 In addition: There were 50 or more warnings (use warnings() to see the first 50)
optimHess(par.fixed, obj$fn, obj$gr) 中的错误:optim 中的梯度评估为长度 1 而不是 15 另外:有 50 个或更多警告(使用 warnings() 查看前 50 个)
warnings()
gives: warnings()
给出:
1. In (function (start, objective, gradient = NULL, hessian = NULL, ... :
NA/NaN function evaluation
I also normally mean center and standardize my numerical variables, but this only removes the first error and keeps the NA/NaN
error.我通常也指中心化和标准化我的数值变量,但这只会消除第一个错误并保持
NA/NaN
错误。 I tried adding a glmmTMBControl
statement like this OP , but it just opened a whole new world of errors.我尝试添加一个像这个 OP这样的
glmmTMBControl
语句,但它只是打开了一个全新的错误世界。
How can I fix this?我怎样才能解决这个问题? What am I doing wrong?
我究竟做错了什么?
A detailed explanation would be appreciated so that I can learn how to troubleshoot this better myself in the future.将不胜感激详细的解释,以便我将来可以学习如何更好地解决此问题。 Alternatively , I am open to a
MCMCglmm
solution as that function can also deal with this sort of data (despite taking longer to run).或者,我对
MCMCglmm
解决方案持开放态度,因为 function 也可以处理此类数据(尽管运行时间更长)。
An incomplete answer...一个不完整的答案...
That said, let me run through some of the things I tried and where I ended up.就是说,让我回顾一下我尝试过的一些事情以及我最终的结果。
GGally::ggpairs(euc, columns=2:10)
doesn't detect anything obviously terrible about the data (I did throw out the data point with euc0==78
) GGally::ggpairs(euc, columns=2:10)
没有检测到任何明显可怕的数据(我确实用euc0==78
丢弃了数据点) In order to try to make the identity-link model work I added some code in glmmTMB.为了尝试使身份链接 model 工作,我在 glmmTMB 中添加了一些代码。 You should be able to install via
remotes::install_github("glmmTMB/glmmTMB/glmmTMB@clamp")
(note you will need compilers etc. installed to install this).您应该能够通过
remotes::install_github("glmmTMB/glmmTMB/glmmTMB@clamp")
安装(注意,您需要安装编译器等来安装它)。 This version takes negative predicted values and forces them to be non-negative, while adding a corresponding penalty to the negative log-likelihood.此版本采用负预测值并强制它们为非负,同时对负对数似然增加相应的惩罚。
Using the new version of glmmTMB I don't get an error, but I do get these warnings:使用新版本的 glmmTMB 我没有收到错误,但确实收到了以下警告:
Warning messages: 1: In fitTMB(TMBStruc): Model convergence problem;
警告信息: 1: In fitTMB(TMBStruc): Model 收敛问题; non-positive-definite Hessian matrix.
非正定 Hessian 矩阵。 See vignette('troubleshooting')
见小插图('疑难解答')
2: In fitTMB(TMBStruc): Model convergence problem;2:in fitTMB(TMBStruc):Model收敛问题; false convergence (8).
错误收敛 (8)。 See vignette('troubleshooting')
见小插图('疑难解答')
The Hessian (second-derivative) matrix being non-positive-definite means there are some (still hard-to-troubleshoot) problems. Hessian(二阶导数)矩阵是非正定的,意味着存在一些(仍然难以解决)问题。
heatmap(vcov(f2)$cond,Rowv=NA,Colv=NA)
lets me look at the covariance matrix. heatmap(vcov(f2)$cond,Rowv=NA,Colv=NA)
让我看看协方差矩阵。 (I also like corrplot::corrplot.mixed(cov2cor(vcov(f2)$cond),"ellipse","number")
, but that doesn't work when vcov(.)$cond
is non-positive definite. In a pinch you can use sfsmisc::posdefify()
to force it to be positive definite...) (我也喜欢
corrplot::corrplot.mixed(cov2cor(vcov(f2)$cond),"ellipse","number")
,但是当vcov(.)$cond
是非正定的时这不起作用。在紧要关头,您可以使用sfsmisc::posdefify()
强制它为正定...)
Tried scaling:尝试缩放:
eucsc <- dplyr::mutate_at(euc1,dplyr::vars(c(ea_grass:precip)), ~c(scale(.)))
This will help some - right now we're still doing a few silly things like treating year as a numeric variable without centering it (so the 'intercept' of the model is at year 0 of the Gregorian calendar...)这将对一些人有所帮助-现在我们仍在做一些愚蠢的事情,例如将年份视为数字变量而不将其居中(因此 model 的“截距”位于公历的第 0 年...)
But that still doesn't fix the problem.但这仍然不能解决问题。
Looking more closely at the ggpairs
plot, it looks like season
and year
are confounded: with(eucsc,table(season,year))
shows that observations occur in Spring and Winter in one year and Autumn in the other year.更仔细地观察
ggpairs
plot,看起来season
和year
是混淆的: with(eucsc,table(season,year))
表明观察发生在 Spring 和一年的冬季和另一年的秋季。 season
and month
are also confounded: if we know the month, then we automatically know the season. season
和month
也被混淆了:如果我们知道月份,那么我们就会自动知道季节。
At this point I decided to give up on the identity link and see what happened.此时我决定放弃身份链接,看看发生了什么。
update(<previous_model>, family=poisson)
(ie using a Poisson with a standard log link) worked! update(<previous_model>, family=poisson)
(即使用带有标准日志链接的泊松)有效! So did using family=nbinom2
, which was much better.使用
family=nbinom2
,这要好得多。
I looked at the results and discovered that the CIs for the precip X season coefficients were crazy, so dropped the interaction term ( update(f2S_noyr_logNB, . ~. - precip:season)
) at which point the results look sensible.我查看了结果,发现 precip X 季节系数的 CI 很疯狂,因此删除了交互项(
update(f2S_noyr_logNB, . ~. - precip:season)
),此时结果看起来很合理。
A few final notes:最后的几点说明:
family=nbinom2
) are probably sufficient.family=nbinom2
)可能就足够了。library(DHARMa); plot(simulateResiduals(f2S_noyr_logNB2))
).library(DHARMa); plot(simulateResiduals(f2S_noyr_logNB2))
)。 I would spend some time plotting residuals and predicted values against various combinations of predictors to see if you can localize the problem. PS A quicker way to see that there's something wrong with the fixed effects (multicollinearity): PS 一种更快的方法来查看固定效果(多重共线性)有问题:
X <- model.matrix(~ ea_grass + ep_grass +
np_grass + np_other_grass + month +
year*precip + season*precip,
data=euc)
ncol(X) ## 13
Matrix::rankMatrix(X) ## 11
lme4
has tests like this, and machinery for automatically dropping aliased columns, but they aren't implemented in glmmTMB
at present. lme4
有这样的测试,以及自动删除别名列的机制,但它们目前没有在glmmTMB
中实现。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.