简体   繁体   English

带有 glmmTMB 的 tab_model() 的奇怪输出

[英]Weird output of tab_model() with glmmTMB

I am getting a weird output when I use the tab_model() function of the sjPlot package in connection with the glmmTMB function of the glmmTMB package to fit a generalized linear mixed model with a beta-family response.当我使用glmmTMB包的tab_model()函数与sjPlot包的glmmTMB函数来拟合具有 beta 系列响应的广义线性混合模型时,我得到了一个奇怪的输出。 The intercept and the marginal R² look very weird.截距和边际 R² 看起来很奇怪。

What is going on here?这里发生了什么?

df <- structure(list(date = structure(c(6L, 5L, 6L, 1L, 4L, 2L, 2L, 
2L, 2L, 4L, 6L, 1L, 6L, 6L, 2L, 2L, 4L, 4L, 5L, 1L), .Label = c("2021-03-17", 
"2021-04-07", "2021-04-13", "2021-04-27", "2021-05-11", "2021-05-27"
), class = "factor"), kettlehole = structure(c(4L, 6L, 6L, 4L, 
7L, 2L, 6L, 5L, 3L, 5L, 1L, 1L, 1L, 1L, 4L, 4L, 5L, 4L, 3L, 5L
), .Label = c("1189", "119", "1202", "149", "172", "2484", "552"
), class = "factor"), plot = structure(c(8L, 4L, 4L, 3L, 7L, 
8L, 1L, 3L, 6L, 4L, 4L, 3L, 6L, 1L, 2L, 7L, 5L, 8L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), treatment = structure(c(2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
1L, 2L, 1L), .Label = c("a", "b"), class = "factor"), distance = structure(c(2L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 
2L, 1L, 1L), .Label = c("2", "5"), class = "factor"), soil_moisture_content = c(0.2173, 
0.1028, 0.148, 0.3852, 0.1535, 0.2618, 0.2295, 0.222, 0.3145, 
0.1482, 0.2442, 0.3225, 0.1715, 0.1598, 0.2358, 0.274, 0.1543, 
0.144, 0.128, 0.361), yield = c(0.518, 0.434, 0.35, 0.599, 0.594, 
0.73, 0.568, 0.442, 0.695, 0.73, 0.667, 0.49, 0.744, 0.56, 0.485, 
0.532, 0.668, 0.511, 0.555, 0.718), weed_coverage = c(0, 0.045, 
0.03, 0.002, 0.11, 0.003, 0.01, 0, 0.02, 0.002, 0, 0.008, 0, 
0.002, 0, 0.006, 0, 0, 0.02, 0.002)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))
library(sjPlot)
library(glmmTMB)

glmmTMB(yield ~ soil_moisture_content + weed_coverage + distance + treatment + (1/kettlehole/plot) + (1|date), family = "beta_family", data = df) -> modop

tab_model(modop)

在此处输入图像描述

EDIT编辑

So here is a screenshot of the result of tab_model() that I used on my actual dataset with n=630.所以这是我在 n=630 的实际数据集上使用的 tab_model() 结果的屏幕截图。 I think the problem is that the model is overfit, as mentioned by Ben and needs to be adjusted by eliminating unncessary predictors.我认为问题在于模型过拟合,正如 Ben 所提到的,需要通过消除不必要的预测变量来进行调整。

在此处输入图像描述

tl;dr the weird intercept results seem to be a bug in sjPlot::tab_model , which should be reported to the maintainers at the sjPlot issues list — it seems that tab_model is mistakenly exponentiating the dispersion parameter when it shouldn't. tl;博士奇怪的拦截结果似乎是sjPlot::tab_model中的一个错误,应该在sjPlot 问题列表中向维护人员报告 - 似乎tab_model在不应该时错误地对分散参数求幂。 However, there are other issues with your model (it's probably overfitted) which are what's messing up your marginal R^2 value.但是,您的模型还有其他问题(可能是过度拟合),这些问题会破坏您的边际 R^2 值。

Here is a run of some sensible simulated data that shows the problem with tab_model() :以下是一些合理的模拟数据,显示了tab_model()的问题:

set.seed(101)
## rbeta() function parameterized by mean and shape
my_rbeta <- function(n, mu, shape0) {
  rbeta(n, shape1 = mu*shape0, shape2 = (1-mu)*shape0)
}
n <- 100; ng <- 10
dd <- data.frame(x = rnorm(n),
                 f = factor(rep(1:(n/ng), ng)))
dd <- transform(dd,
                y = my_rbeta(n,
                             mu = plogis(-1 + 2*x + rnorm(ng)[f]),
                             shape0 = 5))

m1 <- glmmTMB(y ~ x + (1|f), family = "beta_family", dd)
tab_model(m1)

The results of sigma(m1) , print(m1) , summary(m1) all agree that the estimated dispersion parameter is 5.56 (close to its nominal value of 5), and agree with confint(m1, "disp_") : sigma(m1)print(m1)summary(m1)的结果都同意估计的色散参数为 5.56(接近其标称值 5),并且与confint(m1, "disp_")一致:

         2.5 %   97.5 % Estimate
sigma 4.068351 7.606602 5.562942

However, tab_model() reports:但是, tab_model()报告:

tab_model 输出,显示 260 的估计离散度

displaying two problems:显示两个问题:

  • (major) the dispersion is reported as exp(5.563) = 260.6 instead of 5.563, and confidence intervals are similarly (incorrectly) exponentiated (主要)分散报告为exp(5.563) = 260.6 ,置信区间同样(不正确)取幂
  • (minor) the dispersion parameter is labeled as (Intercept) , which is confusing (it is technically the "intercept" of the dispersion model) (次要)色散参数被标记为(Intercept) ,这令人困惑(从技术上讲,它是色散模型的“截距”)

However, the R^2 values look sensible — we'll come back to this.然而,R^2 值看起来很合理——我们会回到这个。


What about the model itself?模型本身呢?

A reasonable rule of thumb (see eg Harrell Regression Modeling Strategies ) says you should generally aim to have about 1 parameter for every 10-20 observations.一个合理的经验法则(参见例如 Harrell Regression Modeling Strategies )说您通常应该针对每 10-20 个观测值设置大约 1 个参数。 Between fixed and random effects, you have 9 parameters ( length(modop$fit$par) , or nobs(modop) - df.residual(modop) ) for 20 observations.在固定效应和随机效应之间,您有 20 个观测值的 9 个参数( length(modop$fit$par)nobs(modop) - df.residual(modop) )。

If we run diagnose(modop) (note I am using a fixed/development version of diagnose() , your results may differ slightly) gives:如果我们运行diagnose(modop) (注意我使用的是固定/开发版本的 diagnostic diagnose() ,您的结果可能会略有不同)给出:

diagnose(modop)
Unusually large coefficients (|x|>10):

theta_1|date.1 
     -11.77722 

Large negative coefficients in zi (log-odds of zero-inflation), dispersion, or random effects (log-standard deviations) suggest unnecessary components (converging to zero on the constrained scale) ... zi 中的大负系数(零通胀的对数几率)、分散或随机效应(对数标准偏差)表明不必要的成分(在约束尺度上收敛到零)......

(if you look at summary(modop) you'll see that the estimated standard deviation of the date random effect is 7e-6, about 4 orders of magnitude less than the next-largest random effect term ...) (如果您查看summary(modop) ,您会发现date随机效应的估计标准偏差为 7e-6,比下一个最大的随机效应项小约 4 个数量级......)

modop2 <- update(modop, . ~ . - (1|date))

diagnose(modop2) says this model is OK. diagnose(modop2)说这个模型是好的。

However, tab_model(modop2) still gives a suspicious conditional R^2 (1.038, ie >1).然而, tab_model(modop2)仍然给出了一个可疑的条件 R^2(1.038,即 >1)。 Running performance::r2_nakagawa(modop2) directly (I believe this is the underlying machinery used by tab_model() ) gives:直接运行performance::r2_nakagawa(modop2) (我相信这是tab_model()使用的底层机制)给出:

# R2 for Mixed Models
  Conditional R2: 1.038
     Marginal R2: 0.183

but with warnings但有警告

1: mu of 1.5 is too close to zero, estimate of random effect variances may be unreliable. 1:1.5 的 mu 太接近于零,随机效应方差的估计可能不可靠。
2: Model's distribution-specific variance is negative. 2:模型的特定分布方差为负。 Results are not reliable.结果不可靠。

I would basically conclude that this data set is just a little too small/model is too big to get useful R^2 values.我基本上会得出结论,这个数据集有点太小/模型太大而无法获得有用的 R^2 值。

FWIW I'm slightly concerned that tab_model() reports N_plot = 8 for this model: it ought to be reporting N_{plot:kettlehole} = 18 as in summary(modop2) FWIW 我有点担心tab_model()报告N_plot = 8这个模型:它应该报告N_{plot:kettlehole} = 18 summary(modop2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM