简体   繁体   中英

GAM error using bam() with family = betar

I'm having trouble solving an error I am getting when running bam() from mgcv .

I note that a similar error was reported here 14 months ago and there seemed to be no agreed on solution - with the suggestion being to email Simon Wood.

My data are here . The data set is too big to paste the output of dput() .

If I run the below model using the entire data set I get the below errors

library(mgcv)

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error, 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

Warning messages:
1: In estimate.theta(theta, family, G$y, linkinv(eta), scale = scale1,  :
  step failure in theta estimation
2: In wt * LS :
  longer object length is not a multiple of shorter object length
3: In muth * (log(y) - log1p(-y)) :
  longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
6: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
7: In prior. weights * y :
  longer object length is not a multiple of shorter object length
8: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
  longer object length is not a multiple of shorter object length

However, if I run the same model using the entire dataset, but I exclude the last row, the model appears to run ok

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[1:20500,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

This suggested to me that there was something wrong in the last row of the data set. However, I cannot see any errors in the last row of the data set that I would expect to produce the above warning messages.

If I again run the same model on a small subset of the data, but this time include the last row of data, the model again appears to run ok.

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[20400:20501,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

But a larger subset of the data, again including the last row, produces similar warning messages to above.

m3 <- bam(pt10 ~ 
            org.type +
            region +
            s(year) + 
            s(year, by = org.type) +
            s(year, by = region), 
          data = error[10000:20501,], 
          method = "fREML", 
          family = betar(link="logit", eps = 0.1),
          select = T)

Warning messages:
1: In wt * LS :
  longer object length is not a multiple of shorter object length
2: In muth * (log(y) - log1p(-y)) :
  longer object length is not a multiple of shorter object length
3: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
4: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
5: In -lgamma(theta) + lgamma(muth) + lgamma(theta - muth) - muth *  :
  longer object length is not a multiple of shorter object length
6: In prior.weights * y :
  longer object length is not a multiple of shorter object length
7: In 2 * wt * (-lgamma(theta) + lgamma(muth) + lgamma(theta - muth) -  :
  longer object length is not a multiple of shorter object length
8: In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method,  :
  algorithm did not converge

Any advice appreciated.

I suspect the problem is with your eps (which probably does indicate that you have issues with the data).

The default is:

r$> .Machine$double.eps*100                                                     
[1] 2.220446e-14

so you are truncating all your response values to the interval [eps, 1-eps] (ie anything y < eps or y > 1-eps is being reset to eps and 1 - eps respectaively.). I suppose that is causing problems with the fitting algorithm and that it is encountering situations that were not anticipated. If there are a not insignificant number of values that are outside the range [eps, 1-eps] , you will be piling all those values up on the limits of the range and I suspect that is leading to situations where subtle changes in the data are leading to numerical problems in the fitting algorithm.

Truncating the data as much as you are doing suggests this is not the right distribution for your data. Absent any other information I'd look elsewhere for a more suitable method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM