简体   繁体   English

R 中的不可逆性障碍 package

[英]Non-invertability Mhurdle package in R

For a project I'm analyzing data with a corner solution at 0. At my disposal I have a response y and about 20 independent variables.对于一个项目,我正在使用 0 的角解决方案分析数据。我可以随意使用响应 y 和大约 20 个自变量。 To model this type of data I would like to use censored regression models, including: Tobit, Truncated Normal Hurdle/Cragg and Tobit Type II.对于 model 这类数据我想使用删失回归模型,包括:Tobit、Truncated Normal Hurdle/Cragg 和 Tobit Type II。 All of these are easily implemented with the 'mhurdle' package in R.所有这些都可以通过 R 中的“障碍”package 轻松实现。

However, for implementing the Truncated Normal Hurdle/Cragg model I've noticed a strange thing happening.但是,为了实现截断法线跨栏/Cragg model,我注意到发生了一件奇怪的事情。 Specifically, when the specifications for the good selection and lack of resources mechanisms become more similar I start running into the error:具体来说,当良好选择和缺乏资源机制的规范变得更加相似时,我开始遇到错误:

 system is computationally singular: reciprocal condition number = 1.13973e-18

So, for instance, the following specification runs fine因此,例如,以下规范运行良好

  model_good <- mhurdle(y ~ x1 + x2 + ... + x20| x1 + x2 |, 0, data = X, dist = "n")

While a specification that includes more of the predictors from the first part (desired consumption) in the second part (good selection) runs into trouble:虽然在第二部分(良好选择)中包含来自第一部分(期望消耗)的更多预测变量的规范会遇到麻烦:

  model_error <- mhurdle(y ~ x1 + x2 + ... + x20| x1 + x2 + ... x15 |, 0, data = X, dist = "n")

I've checked invertibility of my data and that doesn't seem to be the issue.我已经检查了我的数据的可逆性,这似乎不是问题。 None of the pair-wise correlations among the 20 features I use exceeds 0.15, and the matrix (X'X) has full rank.我使用的 20 个特征之间的成对相关性均不超过 0.15,并且矩阵 (X'X) 具有满秩。

Now I'm wondering, is the fact fact the model throws an error when the specifications for the two parts become similar inherent to the way the model works, or is it a package error?现在我想知道,当这两个部分的规格变得与 model 的工作方式固有的相似时,model 是否会引发错误,还是 ZEFE90A8E604A7C840E88D03A67F6B7 错误?

EDIT:编辑:

I'm also running into problems when estimating the examples provided in the documentation ( http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.204.8204&rep=rep1&type=pdf ), for instance:在估计文档( http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.204.8204&rep=rep1&type=pdf )中提供的示例时,我也遇到了问题,例如:

  model12i <- mhurdle(durable ~ age + quant | age + quant | age + quant, tobin, dist = "n", method = "bfgs")

Also gives a singularity issue:还给出了一个奇点问题:

  Lapack routine dgesv: system is exactly singular: U[1,1] = 0

The problem I experienced seemed to be due to the way my data is structured.我遇到的问题似乎是由于我的数据结构方式造成的。 Specifically, some covariates are ratio's that take values in the interval (-1, 1) while other are accounting variables whose domain ranges from (-1e+10, 1e+10).具体来说,一些协变量是在 (-1, 1) 区间内取值的比率,而其他协变量是域范围为 (-1e+10, 1e+10) 的会计变量。 Somehow, the package is unable to deal with the large disparity in domains.不知何故,package 无法处理域中的巨大差异。 Therefore I took two steps:因此我采取了两个步骤:

  • Divided all accounting variables by 10.000将所有会计变量除以 10.000
  • Removed outliers移除异常值

After taking these steps the library performed as expected.采取这些步骤后,库按预期执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM