简体   繁体   English

在回归摘要中隐藏一些系数,同时仍返回调用,r平方和其他摘要输出

[英]Hide some coefficients in regression summary while still returning call, r-squared and other summary output

My question is similar to this one however I am interested in returning all the other outputs, not just coefficients. 我的问题是,类似这样的一个 ,但是我很感兴趣,返回所有的其它输出,而不仅仅是系数。 Here is sample code to make my question clearer. 这是使我的问题更清楚的示例代码。

data=as.data.frame(matrix(rnorm(50*50),50,50))
summary(lm(data[,1]~.-data[,1],data=data))

I want to only output say the first 5 coefficients. 我只想输出说前5个系数。 I know I can just do that with summary(lm(data[,1]~.-data[,1],data=data))$coeff[1:5,] , but that would get rid of all the other outputs I want. 我知道我可以使用summary(lm(data[,1]~.-data[,1],data=data))$coeff[1:5,]做到这一点,但这会摆脱所有其他输出我想要。 I also know I can get each output individually, I just want to know if there is a succicnt way of writing a one liner and cutting out variables that I do not want reported. 我也知道我可以单独获取每个输出,我只想知道是否有一种简洁的方式来编写一个线性衬里并剪切出我不想报告的变量。

You can select the coefficients you want with a slight modification to the print.summary.lm function, which is the internal function that R uses to output the summary results for a summary.lm object. 您可以对print.summary.lm函数稍加修改即可选择所需的系数,该函数是R用来为summary.lm对象输出汇总结果的内部函数。

First, get the code for the function as follows: 首先,获取该函数的代码,如下所示:

getAnywhere(print.summary.lm)

Then, we need to figure out where the coefficient table is extracted and subset it to the rows we want. 然后,我们需要找出提取系数表的位置并将其子集化为所需的行。 We'll add a new my.rows argument to the function and then subset to those rows when we extract the coefficient table. 我们将在函数中添加新的my.rows参数,然后在提取系数表时将其子集添加到这些行。 The code for the modified function is at the end of this answer. 修改后的函数的代码在此答案的结尾。

Now, compare the standard summary with our new summary. 现在,将标准摘要与我们的新摘要进行比较。 First I'll create a model with real data (The model you provided isn't specified properly. Looks like what you intended was lm(V1 ~ ., data=data) , but even then there are no residual degrees of freedom, so I thought I'd demonstrate with a real data set.): 首先,我将使用真实数据创建一个模型(未正确指定您提供的模型。看起来您想要的是lm(V1 ~ ., data=data) ,但是即使那样也没有剩余的自由度,因此我以为我会用真实的数据集进行演示。):

m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)

Standard summary: 标准摘要:

summary(m1)
 Call: lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.6729 -1.6583 -0.4297 1.3307 5.4688 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 33.24160 5.48527 6.060 2.11e-06 *** wt -2.54332 0.93506 -2.720 0.0115 * hp -0.02589 0.01387 -1.866 0.0733 . cyl -0.40179 0.79364 -0.506 0.6169 vs 1.17067 1.81283 0.646 0.5241 am 1.97575 1.64825 1.199 0.2415 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.537 on 26 degrees of freedom Multiple R-squared: 0.8514, Adjusted R-squared: 0.8228 F-statistic: 29.8 on 5 and 26 DF, p-value: 5.571e-10 

New summary with only the coefficients we choose: 新汇总,仅包含我们选择的系数:

Note that we need to call summary on the model first, since my.summary.lm is expecting the summary object, not the model object itself. 请注意,由于my.summary.lm期望使用摘要对象,而不是模型对象本身,因此我们需要首先在模型上调用summary

my.summary.lm(summary(m1), my.rows=2:4)

Rather than selecting coefficients by index, you might prefer to select by name: 与其按索引选择系数,不如按名称选择:

my.summary.lm(summary(m1), my.rows=grep("wt|hp|cyl", names(coef(m1))))
 Call: lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.6729 -1.6583 -0.4297 1.3307 5.4688 Coefficients: Estimate Std. Error t value Pr(>|t|) wt -2.54332 0.93506 -2.720 0.0115 * hp -0.02589 0.01387 -1.866 0.0733 . cyl -0.40179 0.79364 -0.506 0.6169 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.537 on 26 degrees of freedom Multiple R-squared: 0.8514, Adjusted R-squared: 0.8228 F-statistic: 29.8 on 5 and 26 DF, p-value: 5.571e-10 

And here's the function. 这是功能。 I've made only two changes to the original function, both of which are marked with inline comments. 我仅对原始函数进行了两项更改,所有更改均带有内联注释。 The first change is the additional my.rows argument. 第一个更改是附加的my.rows参数。 The second is in the line that begins coefs <- : 第二个是在开始coefs <-的行中:

my.summary.lm = function (x, digits = max(3L, getOption("digits") - 3L), 
                       symbolic.cor = x$symbolic.cor, 
                       signif.stars = getOption("show.signif.stars"), 
                       my.rows, ...)                     # NOTE NEW my.rows ARGUMENT
{
  cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"), 
      "\n\n", sep = "")
  resid <- x$residuals
  df <- x$df
  rdf <- df[2L]
  cat(if (!is.null(x$weights) && diff(range(x$weights))) 
    "Weighted ", "Residuals:\n", sep = "")
  if (rdf > 5L) {
    nam <- c("Min", "1Q", "Median", "3Q", "Max")
    rq <- if (length(dim(resid)) == 2L) 
      structure(apply(t(resid), 1L, quantile), dimnames = list(nam, 
                                                               dimnames(resid)[[2L]]))
    else {
      zz <- zapsmall(quantile(resid), digits + 1L)
      structure(zz, names = nam)
    }
    print(rq, digits = digits, ...)
  }
  else if (rdf > 0L) {
    print(resid, digits = digits, ...)
  }
  else {
    cat("ALL", df[1L], "residuals are 0: no residual degrees of freedom!")
    cat("\n")
  }
  if (length(x$aliased) == 0L) {
    cat("\nNo Coefficients\n")
  }
  else {
    if (nsingular <- df[3L] - df[1L]) 
      cat("\nCoefficients: (", nsingular, " not defined because of singularities)\n", 
          sep = "")
    else cat("\nCoefficients:\n")
    coefs <- x$coefficients[my.rows,]                      # SUBSET my.rows
    if (!is.null(aliased <- x$aliased) && any(aliased)) {
      cn <- names(aliased)
      coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn, 
                                                              colnames(coefs)))
      coefs[!aliased, ] <- x$coefficients
    }
    printCoefmat(coefs, digits = digits, signif.stars = signif.stars, 
                 na.print = "NA", ...)
  }
  cat("\nResidual standard error:", format(signif(x$sigma, 
                                                  digits)), "on", rdf, "degrees of freedom")
  cat("\n")
  if (nzchar(mess <- naprint(x$na.action))) 
    cat("  (", mess, ")\n", sep = "")
  if (!is.null(x$fstatistic)) {
    cat("Multiple R-squared: ", formatC(x$r.squared, digits = digits))
    cat(",\tAdjusted R-squared: ", formatC(x$adj.r.squared, 
                                           digits = digits), "\nF-statistic:", formatC(x$fstatistic[1L], 
                                                                                       digits = digits), "on", x$fstatistic[2L], "and", 
        x$fstatistic[3L], "DF,  p-value:", format.pval(pf(x$fstatistic[1L], 
                                                          x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE), 
                                                       digits = digits))
    cat("\n")
  }
  correl <- x$correlation
  if (!is.null(correl)) {
    p <- NCOL(correl)
    if (p > 1L) {
      cat("\nCorrelation of Coefficients:\n")
      if (is.logical(symbolic.cor) && symbolic.cor) {
        print(symnum(correl, abbr.colnames = NULL))
      }
      else {
        correl <- format(round(correl, 2), nsmall = 2, 
                         digits = digits)
        correl[!lower.tri(correl)] <- ""
        print(correl[-1, -p, drop = FALSE], quote = FALSE)
      }
    }
  }
  cat("\n")
  invisible(x)
}

UPDATE: As I mentioned in my comment, you can roll your own summary function and set it up to work with whatever type of model summary objects you regularly use. 更新:正如我在评论中提到的那样,您可以滚动自己的摘要函数并将其设置为与您经常使用的任何类型的模型摘要对象一起使用。 In this case, we'll include summary.lm and summary.plm objects, which are the types of objects created when you run summary on lm and plm model objects, respectively. 在这种情况下,我们将包括summary.lmsummary.plm对象,它们是分别在lmplm模型对象上运行summary时创建的对象的类型。

First we need lm and plm model objects to work with: 首先,我们需要使用lm和plm模型对象:

# lm object
m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)

# plm object
library(plm)

# Example from plm help
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))

Now we need a function to output the summary data we want. 现在,我们需要一个函数来输出所需的摘要数据。 To create the code below, I just looked at what was in the list objects created by summary(m1) and summary(zz) (do str(summary(m1)) and str(summary(zz)) to see these) so I'd know where to get various elements of the summary objects (such as the call and the residuals ). 要创建下面的代码,我只是查看了由summary(m1)summary(zz)创建的列表对象中的内容(执行str(summary(m1))str(summary(zz))来查看这些内容),所以我知道从何处获取摘要对象的各种元素(例如callresiduals )。 I also in some cases just directly copied code sections from the print.summary.lm and print.summary.plm functions. 在某些情况下,我也只是直接从print.summary.lmprint.summary.plm函数复制代码段。

The function below doesn't print everything that's included in the native summary functions, but should be enough to show you how to go about adding whatever elements you'd like in the output. 下面的函数不会显示本机摘要函数中包含的所有内容,但是应该足以向您展示如何在输出中添加所需的任何元素。

# Summary function that allows selection of which coefficients to include 
# in the coefficient table
# Works with summary.lm and summary.plm objects
my.summary = function(x, rows, digits=3) {

  # Print a few summary elements that are common to both lm and plm model summary objects
  cat("Call\n")
  print(x$call)
  cat("\nResiduals\n")
  print(summary(x$residuals))
  cat("\n")
  print(coef(x)[rows,])

  # Print elements unique to lm model summary objects
  if("summary.lm" %in% class(x)) {
    cat("\nResidual standard error:", round(x$sigma,3), "on", x$df[2], "degrees of freedom")
    cat(paste(c("\nF-statistic:", " on"," and"), round(x$fstatistic,2), collapse=""),
        "DF, p-value:",
        format.pval(pf(x$fstatistic[1L], x$fstatistic[2L], x$fstatistic[3L], 
                       lower.tail = FALSE), digits=digits))

  # Print elements unique to plm model summary objects  
  } else if ("summary.plm" %in% class(x)) {
    cat(paste("\nResidual Sum of Squares: ", signif(deviance(x), 
                                                  digits), "\n", sep = ""))
    fstat <- x$fstatistic
    if (names(fstat$statistic) == "F") {
      cat(paste("F-statistic: ", signif(fstat$statistic), " on ", 
                fstat$parameter["df1"], " and ", fstat$parameter["df2"], 
                " DF, p-value: ", format.pval(fstat$p.value, digits = digits), 
                "\n", sep = ""))
    }
    else {
      cat(paste("Chisq: ", signif(fstat$statistic), " on ", 
                fstat$parameter, " DF, p-value: ", format.pval(fstat$p.value, 
                                                               digits = digits), "\n", sep = ""))
    }
  }
}

Now run the function on an lm model and a plm model: 现在在lm模型和plm模型上运行该函数:

my.summary(summary(m1), 2:4)
 Call lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars) Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -3.6730 -1.6580 -0.4297 0.0000 1.3310 5.4690 Estimate Std. Error t value Pr(>|t|) wt -2.54331718 0.93506164 -2.7199460 0.01148231 hp -0.02588661 0.01387176 -1.8661377 0.07334148 cyl -0.40178727 0.79364098 -0.5062582 0.61694148 Residual standard error: 2.537 on 26 degrees of freedom F-statistic: 29.8 on 5 and 26 DF, p-value: 5.57e-10 
my.summary(summary(zz), 2:3)
 Call plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state", "year")) Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -0.120500 -0.023740 -0.002041 0.000000 0.018140 0.174700 Estimate Std. Error t-value Pr(>|t|) log(pc) 0.2920069 0.02511967 11.62463 7.075069e-29 log(emp) 0.7681595 0.03009174 25.52725 2.021455e-104 Residual Sum of Squares: 1.11 F-statistic: 3064.81 on 4 and 764 DF, p-value: <2e-16 

I suppose if you really want to go all the way, you could take advantage of object orientation and write your own generic function with methods for each type of model you want to include. 我想如果您真的想一路走下去,您可以利用面向对象的优势,并为要包括的每种模型类型的方法编写自己的泛型函数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM