简体   繁体   中英

How to use the * operator in lm() in R when the independent variable is a matrix

I'm fitting several multi-variable linear models using lm()

Basically matrix1 holds the dependent variables (y) and matrix2 the independent ones (x)

model.1<-lm(matrix1[, 1] ~ matrix2)

Where matrix2 has a variable number of columns depending on the specific combination of parameters I want in the regression, no zero-value columns in matrix2 .

This statement works fine for a lineal model with no interaction between independent variables (IV), (a model like this: a0 + a1*x1 + a2*x2 ... ), but if I want to introduce interaction between the IV manual indicates to use the operator * between the variables ( model.1 <- lm(matrix1[, 1] ~ x1 * x2 * x3) ). How can I apply this when the IV are in a matrix?

1) SO questions are supposed to provide the test data reproducibly but here we have done it for you using the builtin data.frame anscombe . After defining the test data we define a data frame containing the columns we want and the appropriate formula. Finally we call lm :

# test data
matrix1 <- as.matrix(anscombe[5:8])
matrix2 <- as.matrix(anscombe[1:4])

DF <- data.frame(matrix1[, 1, drop = FALSE], matrix2) # cols are y1, x1, x2, x3, x4
fo <- sprintf("%s ~ (.)^%d", colnames(matrix1)[1], ncol(matrix2))  # "y1 ~ (.)^4"

lm(fo, DF)

giving:

Call:
lm(formula = fo, data = DF)

Coefficients:
(Intercept)           x1           x2           x3           x4        x1:x2  
    12.8199      -2.6037           NA           NA      -0.1626       0.3628  
      x1:x3        x1:x4        x2:x3        x2:x4        x3:x4     x1:x2:x3  
         NA           NA           NA           NA           NA      -0.0134  
   x1:x2:x4     x1:x3:x4     x2:x3:x4  x1:x2:x3:x4  
         NA           NA           NA           NA  

2) A variation of this which gives a slightly nicer result in the Call: part of the lm output is the following. We use DF from above. do.call will pass the contents of the fo variable rather than its name so that we see the formula in the Call: part of the output. On the other hand, quote(DF) is used to force the name DF to display rather than the contents of the data.frame.

lhs <- colnames(matrix1)[1]
rhs <- paste(colnames(matrix2), collapse = "*")
fo <- paste(lhs, rhs, sep = "~")  # "y1~x1*x2*x3*x4"
do.call("lm", list(fo, quote(DF)))

giving:

Call:
lm(formula = "y1 ~ x1*x2*x3*x4", data = DF)

Coefficients:
(Intercept)           x1           x2           x3           x4        x1:x2  
    12.8199      -2.6037           NA           NA      -0.1626       0.3628  
      x1:x3        x2:x3        x1:x4        x2:x4        x3:x4     x1:x2:x3  
         NA           NA           NA           NA           NA      -0.0134  
   x1:x2:x4     x1:x3:x4     x2:x3:x4  x1:x2:x3:x4  
         NA           NA           NA           NA  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM