當我嘗試在 glmm 中包含線性變量時，Summary() 返回 NaN 值

Question

我正在嘗試使用glmmTMB運行模型。 當我包含 avgt60 時，它在模型中做了一些奇怪的事情，我不確定為什么。 當我將它作為非多邊形項包含時，它給了我 NaN 值。 當我將它作為一個 poly() 項包含在內時，它會拋出整個模型。 當我排除它時，它似乎很好......我是這種工作的新手，所以任何建議都值得贊賞！

m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + avgt60 + (1|year) + (1|site),
              family = "nbinom2", data = weather1)

我得到：

Family: nbinom2  ( log )
Formula:          dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60,      degree = 2) + avgt60 + (1 | year) + (1 | site)
Data: weather1

     AIC      BIC   logLik deviance df.resid 
  1647.9   1687.9   -813.0   1625.9      269 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 year   (Intercept) 5.883e-24 2.426e-12
 site   (Intercept) 6.396e-07 7.997e-04
Number of obs: 280, groups:  year, 3; site, 6

Dispersion parameter for nbinom2 family (): 0.232 

Conditional model:
                            Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -7.8560        NaN     NaN      NaN
poly(rh60, degree = 2)1      47.9631        NaN     NaN      NaN
poly(rh60, degree = 2)2      -5.4370        NaN     NaN      NaN
poly(wndspd60, degree = 2)1  61.7092        NaN     NaN      NaN
poly(wndspd60, degree = 2)2 -74.9432        NaN     NaN      NaN
poly(raintt60, degree = 2)1  27.2669        NaN     NaN      NaN
poly(raintt60, degree = 2)2 -72.9072        NaN     NaN      NaN
avgt60                        0.4384        NaN     NaN      NaN

但是，沒有 avgt60 變量...

m1 <- glmmTMB(dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60, degree = 2) + (1|year) + (1|site),
              family = "nbinom2", data = weather1)


 Family: nbinom2  ( log )
Formula:          dsi ~ poly(rh60, degree = 2) + poly(wndspd60, degree = 2) + poly(raintt60,      degree = 2) + (1 | year) + (1 | site)
Data: weather1

     AIC      BIC   logLik deviance df.resid 
  1648.2   1684.6   -814.1   1628.2      270 

Random effects:

Conditional model:
 Groups Name        Variance  Std.Dev. 
 year   (Intercept) 2.052e-10 1.433e-05
 site   (Intercept) 4.007e-10 2.002e-05
Number of obs: 280, groups:  year, 3; site, 6

Dispersion parameter for nbinom2 family (): 0.23 

Conditional model:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   1.3677     0.3482   3.928 8.56e-05 ***
poly(rh60, degree = 2)1      23.8058     9.6832   2.458 0.013953 *  
poly(rh60, degree = 2)2      -0.3452     4.2197  -0.082 0.934810    
poly(wndspd60, degree = 2)1  34.4332    10.1328   3.398 0.000678 ***
poly(wndspd60, degree = 2)2 -61.2044     6.5179  -9.390  < 2e-16 ***
poly(raintt60, degree = 2)1  12.0109     6.4949   1.849 0.064417 .  
poly(raintt60, degree = 2)2 -57.2197     6.0502  -9.457  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

如果我將 avgt60 保留為 poly() 項，它會拋出整個模型，並且沒有什么是重要的。 這里有什么想法嗎？

這是數據集的鏈接，站點名稱已編輯： https ://docs.google.com/spreadsheets/d/1mFDK_YEshvgGPHpvqu4o6TbFfwKRgHaVUVGOIZnsq7c/edit?usp=sharing

Answer 1

您的數據集中有 280 行，但預測變量只有 10 個唯一值：

nrow(unique(subset(weather1, select = -c(dsi))))

這決定了您可以實際擬合的模型的復雜程度。

您正在嘗試估計 8 個固定效應參數（ length(fixef(m1)$cond)或ncol(model.matrix(m1)) ）、兩個隨機效應參數（站點間和年度間的差異）和一個分散參數（對於負二項式參數）= 11（或length(m1$fit$par) ）。 這比您擁有獨特的預測組合更多的參數！

Murtaugh (2007) 指出，當您有嵌套設計時（預測變量的值僅在組之間而不是組內變化），如果您匯總每個組（或站點/年）的響應變量，您將獲得相同的估計效果在你的情況下組合）到它的意思。 （如果您在這種情況下有不平衡的組，則需要使用權重擬合模型，這種方法不適用於非高斯響應，但原理相似。）

如果您忽略avgt60 ，您“僅”有 10 個參數。 我仍然不太信任這個模型，它被嚴重過度參數化（通常你的目標是（＃觀察）/（＃數據點）至少10，最好是20......）老實說我不是甚至可以確定它為什么會起作用-我認為是因為站點和年份的差異基本上會崩潰到零並從模型中刪除，所以您“只有”要估計 8 個參數？

以下是數據的樣子：

站點 5 和 6 的dsi值始終為零（僅在 2021 年測量）
dsi值在 2019 年非常高，僅測量了兩個站點（1 和 3）
沒有特定的模式，當然也不會與站點和年份混淆。

我可能會嘗試從這些數據中僅得出定性結論或非常簡單的定量結論......

library(tidyverse); theme_set(theme_bw())
w3 <- (weather1
    |> as_tibble()
    |> select(-date)
    |> pivot_longer(-c(site, year, dsi), names_to = "var")
    |> mutate(across(c(year,site), factor))
)

theme_set(theme_bw(base_size = 20)  + theme(panel.spacing = grid::unit(0, "lines")))
(ggplot(w3)
    + aes(x = value, y = dsi, colour = site, shape = year)
    + stat_sum(alpha = 0.6)
    + stat_summary(fun = mean)
    + stat_summary(fun = mean, geom = "line", aes(group = 1), colour = "gray")
    + facet_wrap(~var, scale = "free_x")
    + scale_y_sqrt()
)

Murtaugh, Paul A. “生態數據分析的簡單性和復雜性”。 生態 88，沒有。 1 (2007): 56–62。

當我嘗試在 glmm 中包含線性變量時，Summary() 返回 NaN 值

問題描述

1 個解決方案

解決方案1
1 已采納 2022-06-03 20:19:58

當我嘗試在 glmm 中包含線性變量時，Summary() 返回 NaN 值

問題描述

1 個解決方案

解決方案1 1 已采納 2022-06-03 20:19:58

解決方案1
1 已采納 2022-06-03 20:19:58