简体   繁体   English

两条不同行中的 GAM 残差 - R "mgcv"

[英]GAM residuals in two distinct lines - R "mgcv"

I am trying to run GAMs using binomial data (link=logit) on r with the mgcv package.我正在尝试使用 mgcv 包在 r 上使用二项式数据(link=logit)运行 GAM。 This is to attempt to describe habitat use of bottlenose dolphins using presence (1) and absence (0) data as the response and a suite of environmental variables as the predictor.这是为了尝试使用存在 (1) 和不存在 (0) 数据作为响应并使用一组环境变量作为预测因子来描述宽吻海豚的栖息地使用。

The code I am using appears to be working fine however, when I plot residuals I am left with two distinct lines.我使用的代码似乎工作正常,但是,当我绘制残差时,我留下了两条不同的线。 My understanding is that when plotting residuals there should be an even scatter around the line - however this is not the case - any guidance on what I should be looking for would be greatly appreciated我的理解是,在绘制残差时,线周围应该有一个均匀的散布 - 但事实并非如此 - 任何关于我应该寻找什么的指导将不胜感激

Here is the output using an example of 2 variables:这是使用 2 个变量的示例的输出:

m1<-gam(Presence~s(Dist_Ent_k,k=8)+s(Dist_wall_m,k=5), data=mydata, 
        family = binomial(link = "logit"), weights=resp.weight)

summary(m1)

Family: binomial 
Link function: logit 

Formula:
Presence ~ s(Dist_Ent_k, k = 8) + s(Dist_wall_m, k = 5)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) -0.30155    0.09839  -3.065  0.00218 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                 edf Ref.df Chi.sq p-value   
s(Dist_Ent_k)  2.658  3.333 16.411  0.0015 **
s(Dist_wall_m) 1.389  1.680  0.273  0.7434   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.0359   Deviance explained = 3.42%
UBRE = -0.76828  Scale est. = 1         n = 2696

plot(m1,shade=T,scale = 0,residuals = TRUE)]

Thank you in advance!先感谢您!

What you are plotting are partial residuals and that you see two distinct bands is simply the result of your data being binary or Bernoulli observations.您绘制的是部分残差,并且您看到两个不同的波段只是您的数据是二元或伯努利观测的结果。

You'll see this too if you plot the deviance residuals vs the linear predictor, just more extreme;如果您绘制偏差残差与线性预测变量的关系图,您也会看到这一点,只是更加极端; try尝试

layout(matrix(1:4, ncol = 2, byrow = TRUE))
gam.check(m1)
layout(1)

Model diagnostics for Bernoulli models (binomial with a single trial) are difficult because of the extreme nature of the data — the response is just a 0 or a 1. You can do diagnostics more easily for example if you aggregate the data in some way such that you no longer have m=1 trials but m=M ;由于数据的极端性质,伯努利模型(单次试验的二项式)的模型诊断很困难 - 响应只是 0 或 1。例如,如果您以某种方式聚合数据,则可以更轻松地进行诊断,例如你不再有m=1试验而是m=M say if your data are spatially arranged you could create a larger grid over the region and aggregate the 0s and 1s for the points in each grid, retaining information on how many points were in each grid (to give the M for each aggregated binomial count).假设您的数据在空间上排列,您可以在该区域上创建一个更大的网格并聚合每个网格中点的 0 和 1,保留有关每个网格中有多少点的信息(为每个聚合二项式计数提供M ) .

Otherwise I don;t think there is much to be gained from plotting partial or deviance residuals for such models.否则,我认为绘制此类模型的部分残差或偏差残差不会有什么好处。 The QQ-plot in the set from gam.check() , especially if you add rep = 100 (or some such number) is more useful for checking distributional assumptions as it allows a reference band to be created which has good properties for models like this;来自gam.check()的集合中的 QQ 图,特别是如果您添加rep = 100 (或某个此类数字)对于检查分布假设更有用,因为它允许创建参考带,该带对模型具有良好的属性,例如这个; see ?qq.gam for the function/info needed to create only the QQ plot.有关仅创建 QQ 图所需的功能/信息,请参阅?qq.gam

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM