I have a dataset (data frame) with 5 columns all containing numeric values.
I'm looking to run a simple linear regression for each pair in the dataset.
For example, If the columns were named A, B, C, D, E
, I want to run lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E)
,... and, then I want to plot the data for each pair along with the regression line.
I'm pretty new to R so I'm sort of spinning my wheels on how to actually accomplish this. Should I use ddply
? or lapply
? I'm not really sure how to tackle this.
Here's one solution using combn
combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)
Example:
set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
B=rnorm(50, 100, 3),
C=rnorm(50, 100, 3),
D=rnorm(50, 100, 3),
E=rnorm(50, 100, 3))
Updated: adding @Henrik suggestion (see comments)
# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
(Intercept) B
103.66739418 -0.03354243
$A
(Intercept) C
97.88341555 0.02429041
$A
(Intercept) D
122.7606103 -0.2240759
$A
(Intercept) E
99.26387487 0.01038445
$B
(Intercept) C
99.971253525 0.003824755
$B
(Intercept) D
102.65399702 -0.02296721
$B
(Intercept) E
96.83042199 0.03524868
$C
(Intercept) D
80.1872211 0.1931079
$C
(Intercept) E
89.0503893 0.1050202
$D
(Intercept) E
107.84384655 -0.07620397
I would recommend to also look at the correlation matrix ( cor(DF)
), which is usually the best way to discover linear relationships between variables. The correlation is tightly linked to the covariance and the slopes of a simple linear regression. The computation below exemplifies this link.
Sample data:
set.seed(1)
DF <- data.frame(
A=rnorm(50, 100, 3),
B=rnorm(50, 100, 3),
C=rnorm(50, 100, 3),
D=rnorm(50, 100, 3),
E=rnorm(50, 100, 3)
)
The regression slope is cov(x, y) / var(x)
beta = cov(DF) * (1/diag(var(DF)))
A B C D E
A 1.00000000 -0.045548503 0.028448192 -0.32982367 0.01800795
B -0.03354243 1.000000000 0.003298708 -0.02489518 0.04501362
C 0.02429041 0.003824755 1.000000000 0.24269838 0.15550116
D -0.22407592 -0.022967212 0.193107904 1.00000000 -0.08977834
E 0.01038445 0.035248685 0.105020194 -0.07620397 1.00000000
The intercept is mean(y) - beta * mean(x)
colMeans(DF) - beta * colMeans(DF)
A B C D E
A 1.421085e-14 104.86992 97.44795 133.38310 98.49512
B 1.037180e+02 0.00000 100.02095 102.85026 95.83477
C 9.712461e+01 99.16182 0.00000 75.38373 84.06356
D 1.226899e+02 102.53263 80.87529 0.00000 109.22915
E 9.886859e+01 96.38451 89.41391 107.51930 0.00000
Using combn
for all combination of names of column (in the following example I assumed you want combination of two columns only) and Map
for running over loops.
Example using mtcars data from R:
colc<-names(mtcars)
colcc<-combn(colc,2)
colcc<-data.frame(colcc)
kk<-Map(function(x)lm(as.formula(paste(colcc[1,x],"~",paste(colcc[2,x],collapse="+"))),data=mtcars), as.list(1:nrow(colcc)))
head(kk)
[[1]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) cyl
37.885 -2.876
[[2]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) disp
29.59985 -0.04122
[[3]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) hp
30.09886 -0.06823
[[4]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) drat
-7.525 7.678
[[5]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) wt
37.285 -5.344
[[6]]
Call:
lm(formula = as.formula(paste(colcc[1, x], "~", paste(colcc[2,
x], collapse = "+"))), data = mtcars)
Coefficients:
(Intercept) qsec
-5.114 1.412
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.