简体   繁体   中英

Non-invertability Mhurdle package in R

For a project I'm analyzing data with a corner solution at 0. At my disposal I have a response y and about 20 independent variables. To model this type of data I would like to use censored regression models, including: Tobit, Truncated Normal Hurdle/Cragg and Tobit Type II. All of these are easily implemented with the 'mhurdle' package in R.

However, for implementing the Truncated Normal Hurdle/Cragg model I've noticed a strange thing happening. Specifically, when the specifications for the good selection and lack of resources mechanisms become more similar I start running into the error:

 system is computationally singular: reciprocal condition number = 1.13973e-18

So, for instance, the following specification runs fine

  model_good <- mhurdle(y ~ x1 + x2 + ... + x20| x1 + x2 |, 0, data = X, dist = "n")

While a specification that includes more of the predictors from the first part (desired consumption) in the second part (good selection) runs into trouble:

  model_error <- mhurdle(y ~ x1 + x2 + ... + x20| x1 + x2 + ... x15 |, 0, data = X, dist = "n")

I've checked invertibility of my data and that doesn't seem to be the issue. None of the pair-wise correlations among the 20 features I use exceeds 0.15, and the matrix (X'X) has full rank.

Now I'm wondering, is the fact fact the model throws an error when the specifications for the two parts become similar inherent to the way the model works, or is it a package error?

EDIT:

I'm also running into problems when estimating the examples provided in the documentation ( http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.204.8204&rep=rep1&type=pdf ), for instance:

  model12i <- mhurdle(durable ~ age + quant | age + quant | age + quant, tobin, dist = "n", method = "bfgs")

Also gives a singularity issue:

  Lapack routine dgesv: system is exactly singular: U[1,1] = 0

The problem I experienced seemed to be due to the way my data is structured. Specifically, some covariates are ratio's that take values in the interval (-1, 1) while other are accounting variables whose domain ranges from (-1e+10, 1e+10). Somehow, the package is unable to deal with the large disparity in domains. Therefore I took two steps:

  • Divided all accounting variables by 10.000
  • Removed outliers

After taking these steps the library performed as expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM