简体   繁体   中英

Running multiple, simple linear regressions from dataframe in R

I have a dataset (data frame) with 5 columns all containing numeric values.

I'm looking to run a simple linear regression for each pair in the dataset.

For example, If the columns were named A, B, C, D, E , I want to run lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E) ,... and, then I want to plot the data for each pair along with the regression line.

I'm pretty new to R so I'm sort of spinning my wheels on how to actually accomplish this. Should I use ddply ? or lapply ? I'm not really sure how to tackle this.

Here's one solution using combn

 combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)

Example:

set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
                 B=rnorm(50, 100, 3),
                 C=rnorm(50, 100, 3),
                 D=rnorm(50, 100, 3),
                 E=rnorm(50, 100, 3))

Updated: adding @Henrik suggestion (see comments)

# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
 (Intercept)            B 
103.66739418  -0.03354243 

$A
(Intercept)           C 
97.88341555  0.02429041 

$A
(Intercept)           D 
122.7606103  -0.2240759 

$A
(Intercept)           E 
99.26387487  0.01038445 

$B
 (Intercept)            C 
99.971253525  0.003824755 

$B
 (Intercept)            D 
102.65399702  -0.02296721 

$B
(Intercept)           E 
96.83042199  0.03524868 

$C
(Intercept)           D 
 80.1872211   0.1931079 

$C
(Intercept)           E 
 89.0503893   0.1050202 

$D
 (Intercept)            E 
107.84384655  -0.07620397 

I would recommend to also look at the correlation matrix ( cor(DF) ), which is usually the best way to discover linear relationships between variables. The correlation is tightly linked to the covariance and the slopes of a simple linear regression. The computation below exemplifies this link.

Sample data:

set.seed(1)
DF <- data.frame(
  A=rnorm(50, 100, 3),
  B=rnorm(50, 100, 3),
  C=rnorm(50, 100, 3),
  D=rnorm(50, 100, 3),
  E=rnorm(50, 100, 3)
)

The regression slope is cov(x, y) / var(x)

beta = cov(DF) * (1/diag(var(DF)))

            A            B           C           D           E
A  1.00000000 -0.045548503 0.028448192 -0.32982367  0.01800795
B -0.03354243  1.000000000 0.003298708 -0.02489518  0.04501362
C  0.02429041  0.003824755 1.000000000  0.24269838  0.15550116
D -0.22407592 -0.022967212 0.193107904  1.00000000 -0.08977834
E  0.01038445  0.035248685 0.105020194 -0.07620397  1.00000000

The intercept is mean(y) - beta * mean(x)

colMeans(DF) - beta * colMeans(DF)

             A         B         C         D         E
A 1.421085e-14 104.86992  97.44795 133.38310  98.49512
B 1.037180e+02   0.00000 100.02095 102.85026  95.83477
C 9.712461e+01  99.16182   0.00000  75.38373  84.06356
D 1.226899e+02 102.53263  80.87529   0.00000 109.22915
E 9.886859e+01  96.38451  89.41391 107.51930   0.00000

Using combn for all combination of names of column (in the following example I assumed you want combination of two columns only) and Map for running over loops.

Example using mtcars data from R:

colc<-names(mtcars)
colcc<-combn(colc,2)
colcc<-data.frame(colcc)
kk<-Map(function(x)lm(as.formula(paste(colcc[1,x],"~",paste(colcc[2,x],collapse="+"))),data=mtcars), as.list(1:nrow(colcc)))

 head(kk)
[[1]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)          cyl  
     37.885       -2.876  


[[2]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)         disp  
   29.59985     -0.04122  


[[3]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)           hp  
   30.09886     -0.06823  


[[4]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)         drat  
     -7.525        7.678  


[[5]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)           wt  
     37.285       -5.344  


[[6]]

Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2, 
    x], collapse = "+"))), data = mtcars)

Coefficients:
(Intercept)         qsec  
     -5.114        1.412  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM