[英]Computing p.values in spatial econometric models: why are there inconsistencies between summary() and texreg()?
I am estimating some spatial econometric models that contain both a spatial autoregressive term rho and a spatial error term lambda. 我正在估计一些空间计量经济学模型,其中包含空间自回归项rho和空间误差项lambda。 In attempting to communicate my results I was using the
texreg
package, which accepts the sacsarlm
models I am working with. 在尝试传达我的结果时,我使用的是
texreg
包,它接受我正在使用的sacsarlm
模型。 I noticed, however, that texreg
is printing p-values for the rho and lambda parameters that are identical. 然而,我注意到,
texreg
正在打印相同的rho和lambda参数的p值。 Texreg
appears to be returning the p-value found in the model@LR1$p.value
slot of the model object. Texreg
似乎返回模型对象的model@LR1$p.value
槽中的p值。
The parameters rho and lambda are different in magnitude and have different standard errors so they should not have equivalent p-values. 参数rho和lambda的大小不同,并且具有不同的标准误差,因此它们不应具有等效的p值。 If I call summary on the model object I get unique p-values, but cannot figure out where those values are stored within the model object despite going through each element in a
str(model)
call. 如果我在模型对象上调用summary,我会获得唯一的p值,但无法确定这些值在模型对象中的存储位置,尽管它遍历
str(model)
调用中的每个元素。
My question is twofold: 我的问题是双重的:
Below is a minimal example that shows the problem: 以下是显示问题的最小示例:
library(spdep)
library(texreg)
set.seed(42)
W.ran <- matrix(rbinom(100*100, 1, .3),nrow=100)
X <- rnorm(100)
Y <- .2 * X + rnorm(100) + .9*(W.ran %*% X)
W.test <- mat2listw(W.ran)
model <- sacsarlm(Y~X, type = "sacmixed",
listw=W.test, zero.policy=TRUE)
summary(model)
Call:sacsarlm(formula = Y ~ X, listw = W.test, type = "sacmixed",
zero.policy = TRUE)
Residuals:
Min 1Q Median 3Q Max
-2.379283 -0.750922 0.036044 0.675951 2.577148
Type: sacmixed
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.91037455 0.65700059 1.3857 0.1659
X -0.00076362 0.10330510 -0.0074 0.9941
lag.(Intercept) -0.03193863 0.02310075 -1.3826 0.1668
lag.X 0.89764491 0.02231353 40.2287 <2e-16
Rho: -0.0028541
Asymptotic standard error: 0.0059647
z-value: -0.47849, p-value: 0.6323
Lambda: -0.020578
Asymptotic standard error: 0.020057
z-value: -1.026, p-value: 0.3049
LR test value: 288.74, p-value: < 2.22e-16
Log likelihood: -145.4423 for sacmixed model
ML residual variance (sigma squared): 1.0851, (sigma: 1.0417)
Number of observations: 100
Number of parameters estimated: 7
AIC: 304.88, (AIC for lm: 585.63)
screenreg(model)
=================================
Model 1
---------------------------------
(Intercept) 0.91
(0.66)
X -0.00
(0.10)
lag.(Intercept) -0.03
(0.02)
lag.X 0.90 ***
(0.02)
---------------------------------
Num. obs. 100
Parameters 7
AIC (Linear model) 585.63
AIC (Spatial model) 304.88
Log Likelihood -145.44
Wald test: statistic 1.05
Wald test: p-value 0.90
Lambda: statistic -0.02
Lambda: p-value 0.00
Rho: statistic -0.00
Rho: p-value 0.00
=================================
*** p < 0.001, ** p < 0.01, * p < 0.05
Obviously, in the example, Rho and Lambda have different p-values neither of which are zero and thus there is a problem with the texreg output. 显然,在该示例中,Rho和Lambda具有不同的p值,两者都不为零,因此texreg输出存在问题。 Any help with why this is occurring or where to obtain the correct p-values much appreciated!
任何帮助,为什么这种情况发生或在哪里获得正确的p值非常感谢!
texreg
author here. texreg
作者在这里。 Thanks for catching this. 谢谢你抓住这个。 As described in my reply here ,
texreg
uses extract
methods to retrieve the relevant information from any of the (currently more than 70 supported) model object types. 正如我在这里的回复中所描述的,
texreg
使用extract
方法从任何(当前超过70个支持的)模型对象类型中检索相关信息。 It looks like there is a glitch in the GOF part of the method for sarlm
objects. 看起来在
sarlm
对象的方法的GOF部分中存在一个小故障。
Here is what the method currently looks like (as of texreg
1.36.13): 这是目前的方法(从
texreg
1.36.13开始):
# extension for sarlm objects (spdep package)
extract.sarlm <- function(model, include.nobs = TRUE, include.aic = TRUE,
include.loglik = TRUE, include.wald = TRUE, include.lambda = TRUE,
include.rho = TRUE, ...) {
s <- summary(model, ...)
names <- rownames(s$Coef)
cf <- s$Coef[, 1]
se <- s$Coef[, 2]
p <- s$Coef[, ncol(s$Coef)]
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.nobs == TRUE) {
n <- length(s$fitted.values)
param <- s$parameters
gof <- c(gof, n, param)
gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
gof.decimal <- c(gof.decimal, FALSE, FALSE)
}
if (include.aic == TRUE) {
aic <- AIC(model)
aiclm <- s$AIC_lm.model
gof <- c(gof, aiclm, aic)
gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.loglik == TRUE) {
ll <- s$LL
gof <- c(gof, ll)
gof.names <- c(gof.names, "Log Likelihood")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.wald == TRUE) {
waldstat <- s$Wald1$statistic
waldp <- s$Wald1$p.value
gof <- c(gof, waldstat, waldp)
gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.lambda == TRUE && !is.null(s$lambda)) {
lambda <- s$lambda
LRpval <- s$LR1$p.value[1]
gof <- c(gof, lambda, LRpval)
gof.names <- c(gof.names, "Lambda: statistic", "Lambda: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.rho == TRUE && !is.null(s$rho)) {
rho <- s$rho
LRpval <- s$LR1$p.value[1]
gof <- c(gof, rho, LRpval)
gof.names <- c(gof.names, "Rho: statistic", "Rho: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
tr <- createTexreg(
coef.names = names,
coef = cf,
se = se,
pvalues = p,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
setMethod("extract", signature = className("sarlm", "spdep"),
definition = extract.sarlm)
I think you are right that the lambda and rho parts need some updating. 我认为你是正确的lambda和rho部分需要一些更新。 The
sacsarlm
function does not store the results printed by its summary
method in any object, so you are rightly pointing out that any attempts using str
do not appear to show the true p-values etc. sacsarlm
函数不会将其summary
方法打印的结果存储在任何对象中,因此您正确地指出使用str
任何尝试都不会显示真正的p值等。
Therefore it makes sense to take a look at what the print.summary.sarlm
function in the spdep
package actually does when it prints the summary. 因此,有必要看一下
spdep
包中print.summary.sarlm
函数在打印摘要时实际执行的操作。 I found the code for this function in the file R/summary.spsarlm.R
in the source code of the package on CRAN . 我在CRAN上的软件包的源代码中的文件
R/summary.spsarlm.R
中找到了此函数的代码 。 It looks like this: 它看起来像这样:
print.summary.sarlm <- function(x, digits = max(5, .Options$digits - 3),
signif.stars = FALSE, ...)
{
cat("\nCall:", deparse(x$call), sep = "", fill=TRUE)
if (x$type == "error") if (isTRUE(all.equal(x$lambda, x$interval[1])) ||
isTRUE(all.equal(x$lambda, x$interval[2])))
warning("lambda on interval bound - results should not be used")
if (x$type == "lag" || x$type == "mixed")
if (isTRUE(all.equal(x$rho, x$interval[1])) ||
isTRUE(all.equal(x$rho, x$interval[2])))
warning("rho on interval bound - results should not be used")
cat("\nResiduals:\n")
resid <- residuals(x)
nam <- c("Min", "1Q", "Median", "3Q", "Max")
rq <- if (length(dim(resid)) == 2L)
structure(apply(t(resid), 1, quantile), dimnames = list(nam,
dimnames(resid)[[2]]))
else structure(quantile(resid), names = nam)
print(rq, digits = digits, ...)
cat("\nType:", x$type, "\n")
if (x$zero.policy) {
zero.regs <- attr(x, "zero.regs")
if (!is.null(zero.regs))
cat("Regions with no neighbours included:\n",
zero.regs, "\n")
}
if (!is.null(x$coeftitle)) {
cat("Coefficients:", x$coeftitle, "\n")
coefs <- x$Coef
if (!is.null(aliased <- x$aliased) && any(x$aliased)){
cat(" (", table(aliased)["TRUE"],
" not defined because of singularities)\n", sep = "")
cn <- names(aliased)
coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn,
colnames(x$Coef)))
coefs[!aliased, ] <- x$Coef
}
printCoefmat(coefs, signif.stars=signif.stars, digits=digits,
na.print="NA")
}
# res <- LR.sarlm(x, x$lm.model)
res <- x$LR1
pref <- ifelse(x$ase, "Asymptotic", "Approximate (numerical Hessian)")
if (x$type == "error") {
cat("\nLambda: ", format(signif(x$lambda, digits)),
", LR test value: ", format(signif(res$statistic,
digits)), ", p-value: ", format.pval(res$p.value,
digits), "\n", sep="")
if (!is.null(x$lambda.se)) {
if (!is.null(x$adj.se)) {
x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$lambda.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "), format(signif((x$lambda/
x$lambda.se), digits)),
", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
x$lambda.se))), digits), "\n", sep="")
cat("Wald statistic: ", format(signif(x$Wald1$statistic,
digits)), ", p-value: ", format.pval(x$Wald1$p.value,
digits), "\n", sep="")
}
} else if (x$type == "sac" || x$type == "sacmixed") {
cat("\nRho: ", format(signif(x$rho, digits)), "\n",
sep="")
if (!is.null(x$rho.se)) {
if (!is.null(x$adj.se)) {
x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$rho.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "),
format(signif((x$rho/x$rho.se), digits)),
", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
x$rho.se))), digits), "\n", sep="")
}
cat("Lambda: ", format(signif(x$lambda, digits)), "\n", sep="")
if (!is.null(x$lambda.se)) {
pref <- ifelse(x$ase, "Asymptotic",
"Approximate (numerical Hessian)")
if (!is.null(x$adj.se)) {
x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$lambda.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "), format(signif((x$lambda/
x$lambda.se), digits)),
", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
x$lambda.se))), digits), "\n", sep="")
}
cat("\nLR test value: ", format(signif(res$statistic, digits)),
", p-value: ", format.pval(res$p.value, digits), "\n",
sep="")
} else {
cat("\nRho: ", format(signif(x$rho, digits)),
", LR test value: ", format(signif(res$statistic, digits)),
", p-value: ", format.pval(res$p.value, digits), "\n",
sep="")
if (!is.null(x$rho.se)) {
if (!is.null(x$adj.se)) {
x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)
}
cat(pref, " standard error: ",
format(signif(x$rho.se, digits)),
ifelse(is.null(x$adj.se), "\n z-value: ",
"\n t-value: "),
format(signif((x$rho/x$rho.se), digits)),
", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
x$rho.se))), digits), "\n", sep="")
}
if (!is.null(x$Wald1)) {
cat("Wald statistic: ", format(signif(x$Wald1$statistic,
digits)), ", p-value: ", format.pval(x$Wald1$p.value,
digits), "\n", sep="")
}
}
cat("\nLog likelihood:", logLik(x), "for", x$type, "model\n")
cat("ML residual variance (sigma squared): ",
format(signif(x$s2, digits)), ", (sigma: ",
format(signif(sqrt(x$s2), digits)), ")\n", sep="")
if (!is.null(x$NK)) cat("Nagelkerke pseudo-R-squared:",
format(signif(x$NK, digits)), "\n")
cat("Number of observations:", length(x$residuals), "\n")
cat("Number of parameters estimated:", x$parameters, "\n")
cat("AIC: ", format(signif(AIC(x), digits)), ", (AIC for lm: ",
format(signif(x$AIC_lm.model, digits)), ")\n", sep="")
if (x$type == "error") {
if (!is.null(x$Haus)) {
cat("Hausman test: ", format(signif(x$Haus$statistic,
digits)), ", df: ", format(x$Haus$parameter),
", p-value: ", format.pval(x$Haus$p.value, digits),
"\n", sep="")
}
}
if ((x$type == "lag" || x$type == "mixed") && x$ase) {
cat("LM test for residual autocorrelation\n")
cat("test value: ", format(signif(x$LMtest, digits)),
", p-value: ", format.pval((1 - pchisq(x$LMtest, 1)),
digits), "\n", sep="")
}
if (x$type != "error" && !is.null(x$LLCoef)) {
cat("\nCoefficients: (log likelihood/likelihood ratio)\n")
printCoefmat(x$LLCoef, signif.stars=signif.stars,
digits=digits, na.print="NA")
}
correl <- x$correlation
if (!is.null(correl)) {
p <- NCOL(correl)
if (p > 1) {
cat("\n", x$correltext, "\n")
correl <- format(round(correl, 2), nsmall = 2,
digits = digits)
correl[!lower.tri(correl)] <- ""
print(correl[-1, -p, drop = FALSE], quote = FALSE)
}
}
cat("\n")
invisible(x)
}
You can see there that the function first distinguishes between different sub-models ( error
vs. sac
/ sacmixed
vs. else), then decides which standard errors to use, and then computes the p-values on the fly without saving them anywhere. 你可以看到函数首先区分不同的子模型(
error
与sac
/ sacmixed
与else),然后决定使用哪些标准错误,然后动态计算p值而不将它们保存在任何地方。
So this is what we need to do as well in our extract
method in order to get the same result as the summary
method in the spdep
package. 所以这也是我们在
extract
方法中需要做的,以便获得与spdep
包中的summary
方法相同的结果。 We also need to move this from the GOF block up to the coefficient block of the table (see comments section below for a discussion). 我们还需要将其从GOF块移动到表格的系数块(请参阅下面的评论部分进行讨论)。 Here is my attempt at adopting their approach in the
extract
method: 以下是我尝试在
extract
方法中采用他们的方法:
# extension for sarlm objects (spdep package)
extract.sarlm <- function(model, include.nobs = TRUE, include.loglik = TRUE,
include.aic = TRUE, include.lr = TRUE, include.wald = TRUE, ...) {
s <- summary(model, ...)
names <- rownames(s$Coef)
cf <- s$Coef[, 1]
se <- s$Coef[, 2]
p <- s$Coef[, ncol(s$Coef)]
if (model$type != "error") { # include coefficient for autocorrelation term
rho <- model$rho
cf <- c(cf, rho)
names <- c(names, "$\\rho$")
if (!is.null(model$rho.se)) {
if (!is.null(model$adj.se)) {
rho.se <- sqrt((model$rho.se^2) * model$adj.se)
} else {
rho.se <- model$rho.se
}
rho.pval <- 2 * (1 - pnorm(abs(rho / rho.se)))
se <- c(se, rho.se)
p <- c(p, rho.pval)
} else {
se <- c(se, NA)
p <- c(p, NA)
}
}
if (!is.null(model$lambda)) {
cf <-c(cf, model$lambda)
names <- c(names, "$\\lambda$")
if (!is.null(model$lambda.se)) {
if (!is.null(model$adj.se)) {
lambda.se <- sqrt((model$lambda.se^2) * model$adj.se)
} else {
lambda.se <- model$lambda.se
}
lambda.pval <- 2 * (1 - pnorm(abs(model$lambda / lambda.se)))
se <- c(se, lambda.se)
p <- c(p, lambda.pval)
} else {
se <- c(se, NA)
p <- c(p, NA)
}
}
gof <- numeric()
gof.names <- character()
gof.decimal <- logical()
if (include.nobs == TRUE) {
n <- length(s$fitted.values)
param <- s$parameters
gof <- c(gof, n, param)
gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
gof.decimal <- c(gof.decimal, FALSE, FALSE)
}
if (include.loglik == TRUE) {
ll <- s$LL
gof <- c(gof, ll)
gof.names <- c(gof.names, "Log Likelihood")
gof.decimal <- c(gof.decimal, TRUE)
}
if (include.aic == TRUE) {
aic <- AIC(model)
aiclm <- s$AIC_lm.model
gof <- c(gof, aiclm, aic)
gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.lr == TRUE && !is.null(s$LR1)) {
gof <- c(gof, s$LR1$statistic[[1]], s$LR1$p.value[[1]])
gof.names <- c(gof.names, "LR test: statistic", "LR test: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
if (include.wald == TRUE && !is.null(model$Wald1)) {
waldstat <- model$Wald1$statistic
waldp <- model$Wald1$p.value
gof <- c(gof, waldstat, waldp)
gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
gof.decimal <- c(gof.decimal, TRUE, TRUE)
}
tr <- createTexreg(
coef.names = names,
coef = cf,
se = se,
pvalues = p,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal
)
return(tr)
}
setMethod("extract", signature = className("sarlm", "spdep"),
definition = extract.sarlm)
You can just execute this code at run-time to update the way texreg
handles these objects. 您可以在运行时执行此代码以更新
texreg
处理这些对象的方式。 Please do let me know if you still think there are any glitches I haven't spotted. 如果您仍然认为我没有发现任何故障,请告诉我。 If none are reported in the comments, I will include this updated
extract
method in the next texreg
release. 如果评论中没有报告,我将在下一个
texreg
版本中包含此更新的extract
方法。
With these changes, calling screenreg(model, single.row = TRUE)
yields the following output: 通过这些更改,调用
screenreg(model, single.row = TRUE)
产生以下输出:
=======================================
Model 1
---------------------------------------
(Intercept) 0.91 (0.66)
X -0.00 (0.10)
lag.(Intercept) -0.03 (0.02)
lag.X 0.90 (0.02) ***
rho -0.00 (0.01)
lambda -0.02 (0.02)
---------------------------------------
Num. obs. 100
Parameters 7
Log Likelihood -145.44
AIC (Linear model) 585.63
AIC (Spatial model) 304.88
LR test: statistic 288.74
LR test: p-value 0.00
=======================================
*** p < 0.001, ** p < 0.01, * p < 0.05
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.