简体   繁体   中英

multivariate logistic regression in R

I want to run a simple multivariate logistic regression. I made an example below with binary data to talk through an example.

multivariate regression = trying to predict 2+ outcome variables

> y = matrix(c(0,0,0,1,1,1,1,1,1,0,0,0), nrow=6,ncol=2)

> x = matrix(c(1,0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,1,1,1,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1), nrow=6,ncol=6)
> x
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    1    1    1
[2,]    0    1    1    1    1    1
[3,]    0    0    1    1    1    1
[4,]    0    0    0    1    1    1
[5,]    0    0    0    0    1    1
[6,]    0    0    0    0    0    1
> y
     [,1] [,2]
[1,]    0    1
[2,]    0    1
[3,]    0    1
[4,]    1    0
[5,]    1    0
[6,]    1    0

So, variable "x" has 6 samples and each sample has 6 attributes. Variable "y" has 2 predictions for each of the 6 samples. I specifically want to work with binary data.

> fit = glm(y~x-1, family = binomial(logit))

I do "-1" to eliminate the intercept coefficient. Everything else is standard logistic regression in a multivariate situation.

> fit

Call:  glm(formula = y ~ x - 1, family = binomial(logit))

Coefficients:
 data1   data2   data3   data4   data5   data6  
  0.00    0.00  -49.13    0.00    0.00   24.57  

Degrees of Freedom: 6 Total (i.e. Null);  0 Residual
Null Deviance:      8.318 
Residual Deviance: 2.572e-10    AIC: 12

At this point things are starting to look off. I am not sure why the inte.net for data 3 and 6 is what it is.

val <- predict(fit,data.frame(c(1,1,1,1,1,1)), type = "response")

> val
       1            2            3            4            5            6 
2.143345e-11 2.143345e-11 2.143345e-11 1.000000e+00 1.000000e+00 1.000000e+00 

Logically I am doing something wrong. I am expecting a 1x2 matrix, not 1x6. I want matrix that tells me the probability of data frame vector being a "1"(true) in y1 and y2.

Any help would be appreciated.

Note: I updated the ending of my question based on reply from Mario.

Unlike lm , glm does not work with multivariate response variables. As a workaround, you can fit several GLMs:

fit1 <- glm(y[,1] ~ x-1, family=binomial(logit))
fit2 <- glm(y[,2] ~ x-1, family=binomial(logit))

Or you can use glmer from the lme4 package, which is meant to model mixed models, but you can simply omit "random effects". AFAIK, glmer supports multivariate responses.

The argument newdata need to be a data.frame. You can do this:

aux <- data.frame(c(1,1,1,1,1,1))
val <- predict(fit, aux, type = "response")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM