当我尝试在 glmm 中包含线性变量时，Summary() 返回 NaN 值

Question

I am attempting to run a model using glmmTMB .我正在尝试使用glmmTMB运行模型。 When I include avgt60, it does weird things in the model and I am not really sure why.当我包含 avgt60 时，它在模型中做了一些奇怪的事情，我不确定为什么。 When I include it as a non poly term, it gives me NaN values.当我将它作为非多边形项包含时，它给了我 NaN 值。 When I include it as a poly() term, it throws the entire model off.当我将它作为一个 poly() 项包含在内时，它会抛出整个模型。 When I exclude it, it seems to be fine... I am new to this type of work so any advice is appreciated!当我排除它时，它似乎很好......我是这种工作的新手，所以任何建议都值得赞赏！

m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1|year) + (1|site),
              family = "nbinom2", data = weather1)

I get:我得到：

Family: nbinom2  ( log )
Formula:          dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60,      degree = 2) + avgt60 + (1 | year) + (1 | site)
Data: weather1

     AIC      BIC   logLik deviance df.resid 
  1647.9   1687.9   -813.0   1625.9      269 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 year   (Intercept) 5.883e-24 2.426e-12
 site   (Intercept) 6.396e-07 7.997e-04
Number of obs: 280, groups:  year, 3; site, 6

Dispersion parameter for nbinom2 family (): 0.232 

Conditional model:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -7.8560        NaN     NaN      NaN
poly(rh60, degree = 2)1      47.9631        NaN     NaN      NaN
poly(rh60, degree = 2)2      -5.4370        NaN     NaN      NaN
poly(wndspd60, degree = 2)1  61.7092        NaN     NaN      NaN
poly(wndspd60, degree = 2)2 -74.9432        NaN     NaN      NaN
poly(raintt60, degree = 2)1  27.2669        NaN     NaN      NaN
poly(raintt60, degree = 2)2 -72.9072        NaN     NaN      NaN
avgt60                        0.4384        NaN     NaN      NaN

But, without the avgt60 variable...但是，没有 avgt60 变量...

m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1|year) + (1|site),
              family = "nbinom2", data = weather1)


 Family: nbinom2  ( log )
Formula:          dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60,      degree = 2) + (1 | year) + (1 | site)
Data: weather1

     AIC      BIC   logLik deviance df.resid 
  1648.2   1684.6   -814.1   1628.2      270 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 year   (Intercept) 2.052e-10 1.433e-05
 site   (Intercept) 4.007e-10 2.002e-05
Number of obs: 280, groups:  year, 3; site, 6

Dispersion parameter for nbinom2 family (): 0.23 

Conditional model:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   1.3677     0.3482   3.928 8.56e-05 ***
poly(rh60, degree = 2)1      23.8058     9.6832   2.458 0.013953 *  
poly(rh60, degree = 2)2      -0.3452     4.2197  -0.082 0.934810    
poly(wndspd60, degree = 2)1  34.4332    10.1328   3.398 0.000678 ***
poly(wndspd60, degree = 2)2 -61.2044     6.5179  -9.390  < 2e-16 ***
poly(raintt60, degree = 2)1  12.0109     6.4949   1.849 0.064417 .  
poly(raintt60, degree = 2)2 -57.2197     6.0502  -9.457  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If I leave avgt60 in as a poly() term, it throws the entire model off, and nothing is significant.如果我将 avgt60 保留为 poly() 项，它会抛出整个模型，并且没有什么是重要的。 Any thoughts here?这里有什么想法吗？

Here's a link to the dataset, with site names redacted: https://docs.google.com/spreadsheets/d/1mFDK_YEshvgGPHpvqu4o6TbFfwKRgHaVUVGOIZnsq7c/edit?usp=sharing这是数据集的链接，站点名称已编辑： https ://docs.google.com/spreadsheets/d/1mFDK_YEshvgGPHpvqu4o6TbFfwKRgHaVUVGOIZnsq7c/edit?usp=sharing

Answer 1

You have 280 rows in your data set, but only 10 unique values of the predictor variables:您的数据集中有 280 行，但预测变量只有 10 个唯一值：

nrow(unique(subset(weather1, select = -c(dsi))))

This determines how complex a model you can actually fit.这决定了您可以实际拟合的模型的复杂程度。

You are trying to estimate 8 fixed-effect parameters ( length(fixef(m1)$cond) or ncol(model.matrix(m1)) ), two random-effect parameters (among-site and among-year variances), and one dispersion parameter (for the negative binomial parameter) = 11 (or length(m1$fit$par) ).您正在尝试估计 8 个固定效应参数（ length(fixef(m1)$cond)或ncol(model.matrix(m1)) ）、两个随机效应参数（站点间和年度间的差异）和一个分散参数（对于负二项式参数）= 11（或length(m1$fit$par) ）。 This is more parameters than you have unique predictor combinations!这比您拥有独特的预测组合更多的参数！

Murtaugh (2007) makes the point that when you have a nested design (values of the predictor variables change only between groups, not within groups) you will get the same effects estimated if you aggregate the response variable for each group (or site/year combination in your case) to its mean. Murtaugh (2007) 指出，当您有嵌套设计时（预测变量的值仅在组之间而不是组内变化），如果您汇总每个组（或站点/年）的响应变量，您将获得相同的估计效果在你的情况下组合）到它的意思。 (If you have unbalanced groups as in this case you need to fit a model with weights, and this approach doesn't work well for non-Gaussian responses, but the principle is similar.) （如果您在这种情况下有不平衡的组，则需要使用权重拟合模型，这种方法不适用于非高斯响应，但原理相似。）

If you leave out avgt60 you "only" have 10 parameters.如果您忽略avgt60 ，您“仅”有 10 个参数。 I still don't trust this model very much, it's badly overparameterized (normally you're aiming for something like (# observations)/(# data points) at least 10, preferably 20 ...) To be honest I'm not even sure why it's working - I think because the site and year variances are basically collapsing to zero and removing themselves from the model, so you "only" have 8 parameters to estimate?我仍然不太信任这个模型，它被严重过度参数化（通常你的目标是（＃观察）/（＃数据点）至少10，最好是20......）老实说我不是甚至可以确定它为什么会起作用-我认为是因为站点和年份的差异基本上会崩溃到零并从模型中删除，所以您“只有”要估计 8 个参数？

Here's what the data look like:以下是数据的样子：

dsi values for sites 5 and 6 are always zero (only measured in 2021)站点 5 和 6 的dsi值始终为零（仅在 2021 年测量）
dsi values are very high in 2019, only two sites (1 and 3) measured dsi值在 2019 年非常高，仅测量了两个站点（1 和 3）
there is no particular pattern, and certainly not one that's not confounded with site and year.没有特定的模式，当然也不会与站点和年份混淆。

I would probably try to draw only qualitative conclusions, or very simple quantitative conclusions, from these data ...我可能会尝试从这些数据中仅得出定性结论或非常简单的定量结论......

library(tidyverse); theme_set(theme_bw())
w3 <- (weather1
    |> as_tibble()
    |> select(-date)
    |> pivot_longer(-c(site, year, dsi), names_to = "var")
    |> mutate(across(c(year,site), factor))
)

theme_set(theme_bw(base_size = 20)  + theme(panel.spacing = grid::unit(0, "lines")))
(ggplot(w3)
    + aes(x = value, y = dsi, colour = site, shape = year)
    + stat_sum(alpha = 0.6)
    + stat_summary(fun = mean)
    + stat_summary(fun = mean, geom = "line", aes(group = 1), colour = "gray")
    + facet_wrap(~var, scale = "free_x")
    + scale_y_sqrt()
)

Murtaugh, Paul A. “Simplicity and Complexity in Ecological Data Analysis.” Murtaugh, Paul A. “生态数据分析的简单性和复杂性”。 Ecology 88, no.生态 88，没有。 1 (2007): 56–62. 1 (2007): 56–62。

当我尝试在 glmm 中包含线性变量时，Summary() 返回 NaN 值

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-06-03 20:19:58

当我尝试在 glmm 中包含线性变量时，Summary() 返回 NaN 值

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-06-03 20:19:58

解决方案1
1 已采纳 2022-06-03 20:19:58