简体   繁体   中英

Zero inflated rolling beta regression with zoo package?

I'm trying to evaluate changes in the relationship between two variables over time. I created an irregular time series object of 46 years using the zoo package. My data are zero-inflated proportions that take the values 0 and 1. Here is the data:

edf
   Year     World        Ego
1  1760 1.0000000 0.00000000
2  1761 0.3055556 0.00000000
3  1762 0.3950617 0.11814413
4  1764 0.8677686 0.26984127
5  1766 0.0000000 0.00000000
6  1767 0.8580606 0.15407986
7  1769 0.7500000 0.00000000
8  1771 0.7416174 0.37698413
9  1772 0.6611570 0.53587372
10 1777 0.4375000 0.20000000
11 1778 0.9629630 0.36111111
12 1779 0.7229630 0.05291005
13 1781 0.0000000 0.00000000
14 1782 0.0000000 0.00000000
15 1783 0.7500000 0.00000000
16 1784 0.7966605 0.21893984
17 1785 0.8518519 0.12500000
18 1786 0.0000000 0.00000000
19 1787 0.2279036 0.00000000
20 1788 0.7425926 0.08585859
21 1789 0.4648760 0.17942337
22 1790 0.8888889 0.00000000
23 1791 0.7958546 0.35023819
24 1792 0.0000000 0.00000000
25 1794 0.8021333 0.65529337
26 1795 0.0000000 0.00000000
27 1800 0.9900000 0.10825397
28 1802 0.7866667 0.07500000
29 1803 0.0000000 0.00000000
30 1804 0.0000000 0.00000000
31 1805 0.7416026 0.34158521
32 1806 0.9420000 0.47337963
33 1810 0.7500000 0.00000000
34 1812 0.8397279 0.53089503
35 1818 0.4863946 0.31103450
36 1819 0.8636475 0.20591162
37 1820 0.8888889 0.00000000
38 1821 0.7197232 0.60557261
39 1822 0.7308806 0.27126586
40 1823 0.6113805 0.26487719
41 1824 0.6400000 0.00000000
42 1826 0.9086405 0.13932918
43 1827 0.7447051 0.16207173
44 1828 0.9183673 0.40000000
45 1830 0.9843750 0.50000000
46 1831 0.7053061 0.55736111

I'm using beta regression but transforming the dependent variable values using the recommendation in the manual:

y.transf.betareg <- function(y){
  n.obs <- sum(!is.na(y))
  (y * (n.obs - 1) + 0.5) / n.obs
}

And then using rollapply to compute a moving regression. Here is my code:

library(zoo)
library(betareg)
brol<-as.zoo(edf)
index1 <- rollapply(data = brol,  
                          width = 5,  
                          function(brr)  coef(betareg(y.transf.betareg(brr[3])~brr[2],
                                            data=as.data.frame(brr),
                                            na.action = na.omit
                                    ),
                      by.column = F,
                      align="right")) 

But I get this error:

Error in optim(par = start, fn = loglikfun, gr = gradfun, method = method,  : 
  non-finite value supplied by optim

I get the same error when I try to use linear spline regression with betareg.

The code I wrote works with other models I've tried such as a binomial GLM with a logit link or a GAMLSS, but not with betareg.

From a bit of researching it seems like each piece of the data passed to the function may not be full rank, but I don't know how to deal with this. Can anyone advise? Many, many thanks in advance.

Disclaimer: I have several comments - more than an answer - but as I want to show the code and output and need some more space, I do it in form of an answer.

First, in your y.transf.betareg you want to use the transformation that is recommended in the betareg vignette. As the "number of observations" you use 46, the number of time points in your data. However, for the correction term one should use the number of observations from which the proportion was computed (if applicable). For example, in 1761, the variable World is 0.3055556 which might have come from approximately 11/36. If that were the case, then 36 should have been the number of observations.

Second, the beta regression model you are fitting has three parameters (intercept, slope, precision) so that using a rolling window size of four observations is very - let's say - optimistic. I cannot imagine that in your application this is anything more than random noise.

Therefore, I would propose to first find a model that you would expect holds approximately for your data. Given that there is left-censoring at zero, a natural candidate seems to be a simple tobit model. A scatter plot of the data along with the tobit regression line is:

具有 tobit 模型拟合的散点图

The replication code is included at the end. Overall, the model seems to fit reasonably well with significant slope estimate:

           Estimate Std. Error  z value  Pr(>|z|)    
(Intercept) -0.27869    0.12527  -2.2247 0.0261034 *  
World        0.61259    0.16398   3.7358 0.0001871 ***
Log(scale)  -1.40915    0.14001 -10.0645 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Then, you seem to be interested in assessing whether the parameters are stable over the sampling period. However, as the sample size is somewhat limited, re-estimating the model on each subsample will lead to a lot of random fluctuation. Instead, one can take rolling sums of the model scores (a type of residual if you will) as estimated on the entire sample. If there are systematic changes in any of the parameters, you will see them in systematic changes in the rolling sums. This type of test is also called a score-based MOSUM (moving sum) test in the structural change literature. Below I show the visualization of a MOSUM test with bandwith 15% (ie, 7 observations) along with its 5% critical values. The corresponding p-value is 49.6% and thus clearly non-significant. The plot shows no systematic departures from zero.

tobit模型参数不稳定性的MOSUM波动检验

So with this moderate size of the sample one cannot detect significant departures from a single model fit with parameters given as above. (MOSUM tests with increased or decreased bandwith lead to equivalent results.)

Replication code:

Data

library("zoo")
edf <- read.zoo(textConnection("   Year     World        Ego
1  1760 1.0000000 0.00000000
2  1761 0.3055556 0.00000000
3  1762 0.3950617 0.11814413
4  1764 0.8677686 0.26984127
5  1766 0.0000000 0.00000000
6  1767 0.8580606 0.15407986
7  1769 0.7500000 0.00000000
8  1771 0.7416174 0.37698413
9  1772 0.6611570 0.53587372
10 1777 0.4375000 0.20000000
11 1778 0.9629630 0.36111111
12 1779 0.7229630 0.05291005
13 1781 0.0000000 0.00000000
14 1782 0.0000000 0.00000000
15 1783 0.7500000 0.00000000
16 1784 0.7966605 0.21893984
17 1785 0.8518519 0.12500000
18 1786 0.0000000 0.00000000
19 1787 0.2279036 0.00000000
20 1788 0.7425926 0.08585859
21 1789 0.4648760 0.17942337
22 1790 0.8888889 0.00000000
23 1791 0.7958546 0.35023819
24 1792 0.0000000 0.00000000
25 1794 0.8021333 0.65529337
26 1795 0.0000000 0.00000000
27 1800 0.9900000 0.10825397
28 1802 0.7866667 0.07500000
29 1803 0.0000000 0.00000000
30 1804 0.0000000 0.00000000
31 1805 0.7416026 0.34158521
32 1806 0.9420000 0.47337963
33 1810 0.7500000 0.00000000
34 1812 0.8397279 0.53089503
35 1818 0.4863946 0.31103450
36 1819 0.8636475 0.20591162
37 1820 0.8888889 0.00000000
38 1821 0.7197232 0.60557261
39 1822 0.7308806 0.27126586
40 1823 0.6113805 0.26487719
41 1824 0.6400000 0.00000000
42 1826 0.9086405 0.13932918
43 1827 0.7447051 0.16207173
44 1828 0.9183673 0.40000000
45 1830 0.9843750 0.50000000
46 1831 0.7053061 0.55736111"), header = TRUE)

Full sample model

library("AER")
m <- tobit(Ego ~ World, data = edf)
coeftest(m)

Scatter plot

plot(jitter(Ego, 10) ~ jitter(World, 10), data = edf,
  xlab = "World (jittered)", ylab = "Ego (jittered)")
abline(m)
legend("topleft", "Tobit model", lwd = 1, bty = "n")

MOSUM test

library("strucchange")
sctest(m, order.by = time(edf), functional = maxMOSUM(0.15), 
  plot = TRUE, aggregate = FALSE, ylim = c(-1.5, 1.5))

EDIT: Solved by my friend. For the record, if anyone cares: it was an issue of window width, which I thought I had played around with - but not enough. The beta regression model was having trouble estimating coefficients for each iteration of the window, so we created a loop with the beta regression, tracked its progress, and saw when the error came:

brr.function <- function(brr) {
  coef(betareg(y.transf.betareg(brr[,3])~brr[,2],
              data=as.data.frame(brr),
              na.action = na.omit))
}

a <- NULL 
total.obs <- nrow(brol)
mw <- 5  # window length

for (i in 1:c(total.obs-mw)){
  a<-c(a,brr.function(brol[i:c(i+mw),]))
  cat("i=",i,"\n") # this code tracks the progress of the loop
}

We saw that it stopped at 12, so we checked out that piece of the data:

i <- 12
brol[i:c(i+mw),]

Year    World        Ego
12 1779 0.722963 0.05291005
13 1781 0.000000 0.00000000
14 1782 0.000000 0.00000000
15 1783 0.750000 0.00000000

Then we set window width to 4, and the code runs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM