how to select variables in a linear regression based on values in a matrix with R

Question

I am working with R.

I have a matrix called combination:

comb <- matrix( c(1,2,1,3,2,3) , nrow = 3 , ncol = 2)
n_comb<-3

I have a one column dataframe called y with the values of my y variable.

I have a 3 column dataframe called reg with 3 regressors.

I want to do a loop which regresses y on all possible combinations of reg, selecting each time two variables. Hopefully, I can store the values of the regression somewhere so that I can easily access them afterwards. For instance, I would like to store the R square of each regression, as well as the x variables employed associated with the R square value.

So far I have tried:

 for (i in 1:n_comb){
 *reg_simple <- select only the variables I need*
 all<-cbind (y,reg_simple)
 colnames(all)[1] <- "y"
 regression <-lm(y~.,all)
 summary (regression)
 *store the R square and the regressors somewhere*
 }

`

Answer 1

If we wanted to use the predictors based on each row of the 'comb', loop over the rows of the 'comb' matrix (either with apply/MARGIN = 1 or split by row ( asplit - MARGIN = 1 ) and loop with sapply ), create the formula using reformulate , apply the lm , and extract the r.squared values

rsquare_out <- sapply(asplit(comb, 1), 
     function(i) summary(lm(reformulate(names(reg)[i], response = 'y'),
       data = cbind(reg, y)))$r.squared)

Answer 2

Using loops:

Dummy data:

n = 100
y = rnorm(n)
x = data.frame(x1=1*y+rnorm(n),
               x2=2*y+rnorm(n),
               x3=3*y+rnorm(n))

comb = gtools::combinations(3, 2)

Code:

regs = list()
for(i in 1:nrow(comb)){
  mod = summary(lm(y ~ ., x[,comb[i,]]))
  regs[[i]] = list(call=mod$terms[[3]],
                   coefs=mod$coefficients,
                   RS=mod$r.squared)}

You can include anything else you want in the list() . Output:

> regs
[[1]]
[[1]]$call
x1 + x2

[[1]]$coefs
              Estimate Std. Error    t value     Pr(>|t|)
(Intercept) 0.03686327 0.04218032  0.8739449 3.843069e-01
x1          0.13359822 0.04037758  3.3087228 1.316050e-03
x2          0.36019362 0.02384050 15.1084733 3.143002e-27

[[1]]$RS
[1] 0.8384476


[[2]]
[[2]]$call
x1 + x3

[[2]]$coefs
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.03390277 0.02660885  1.274116 2.056664e-01
x1          0.04295226 0.02654442  1.618128 1.088823e-01
x3          0.28556167 0.01064096 26.836090 1.110231e-46

[[2]]$RS
[1] 0.9356962


[[3]]
[[3]]$call
x2 + x3

[[3]]$coefs
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.0291651 0.02391629  1.219466 2.256244e-01
x2          0.1116096 0.02205835  5.059746 1.989407e-06
x3          0.2304448 0.01497633 15.387271 8.944792e-28

[[3]]$RS
[1] 0.9477506

Or you can use this to name the lists with the call:

regs = list()
for(i in 1:nrow(comb)){
  names = colnames(x)[comb[i,]]
  mod = summary(lm(y ~ ., x[,names]))
  regs[[paste(names, collapse=" + ")]] = list(coefs=mod$coefficients,
                                              RS=mod$r.squared)}

Output:

> regs
$`x1 + x2`
$`x1 + x2`$coefs
              Estimate Std. Error    t value     Pr(>|t|)
(Intercept) 0.03686327 0.04218032  0.8739449 3.843069e-01
x1          0.13359822 0.04037758  3.3087228 1.316050e-03
x2          0.36019362 0.02384050 15.1084733 3.143002e-27

$`x1 + x2`$RS
[1] 0.8384476


$`x1 + x3`
$`x1 + x3`$coefs
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.03390277 0.02660885  1.274116 2.056664e-01
x1          0.04295226 0.02654442  1.618128 1.088823e-01
x3          0.28556167 0.01064096 26.836090 1.110231e-46

$`x1 + x3`$RS
[1] 0.9356962


$`x2 + x3`
$`x2 + x3`$coefs
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 0.0291651 0.02391629  1.219466 2.256244e-01
x2          0.1116096 0.02205835  5.059746 1.989407e-06
x3          0.2304448 0.01497633 15.387271 8.944792e-28

$`x2 + x3`$RS
[1] 0.9477506

how to select variables in a linear regression based on values in a matrix with R

Question

2 answers

solution1
2 2020-11-13 20:47:19

solution2
2 2020-11-13 21:00:37

how to select variables in a linear regression based on values in a matrix with R

Question

2 answers

solution1 2 2020-11-13 20:47:19

solution2 2 2020-11-13 21:00:37

solution1
2 2020-11-13 20:47:19

solution2
2 2020-11-13 21:00:37