简体   繁体   中英

How to perform logistic regression on not binary variable?

I was searching for this answer and I'm really suprised that haven't found it. I just want to peform three level logistic regression in R.

Let's define some artificial data:

set.seed(42)
y <- sample(0:2, 100, replace = T)
x <- rnorm(100)

My variable y is containing three numbers - 0, 1 and 2. So I thought that the simplest way would be just to use:

glm(y ~ x, family = binomial("logit"))

However I got information that y should be in interval [0,1]. Do you know how I can perform this regression?

Please notice - I know that it's not so straightforward to perform multilevel logistic regression, there are several techniques how to do so eg One vs all. But as I was seeking for it, I wasn't able to find any.

set.seed(42)
y <- sample(0:2, 100, replace = TRUE)
x <- rnorm(100)

multinomial regression

If you don't want to treat your responses as ordered (ie, nominal or categorical values):

library(nnet) ## 'recommended' package, i.e. installed by default
multinom(y~x)

Results

# weights:  9 (4 variable)
initial  value 109.861229 
final  value 104.977336 
converged
Call:
multinom(formula = y ~ x)

Coefficients:
   (Intercept)           x
1 -0.001529465  0.29386524
2 -0.649236723 -0.01933747

Residual Deviance: 209.9547 
AIC: 217.9547 

Or, if your responses are ordered:

ordinal regression

MASS::polr() does proportional-odds logistic regression. (You may also be interested in the ordinal package, which has more features; it can also do multinomial models.)

library(MASS) ## also 'recommended'
polr(ordered(y)~x)

Results

Call:
polr(formula = ordered(y) ~ x)

Coefficients:
         x 
0.06411137 

Intercepts:
       0|1        1|2 
-0.4102819  1.3218487 

Residual Deviance: 212.165 
AIC: 218.165 

Logistic regression as implemented by glm only works for 2 levels of output, not 3.

The message is a little vauge because you can specify the y-variable in logistic regression as 0s and 1s, or as a proportion (between 0 and 1) with a weights argument specifying the number of subjects the proportion is of.

With 3 or more ordered levels in the response you need to use a generalization, one common generalization is proportional odds logistic regression (also goes by other names). The polr function in the MASS package and the lrm function in the rms package (and probably other functions in other packages) fit these types of models, but glm does not.

If you read the error message, it offers a hint that you might get success with:

y <- sample(seq(0,1,length=3), 100, replace = T)

And in fact, you do. Now you challenge might be to interpret that in the context of the actual situation in reality (which you have not offered.) You do get a warning, but R warnings are not errors.

You might also look up the topic of polychotomous logistic regression, which is implemented in several variants that might be useful in particular situations. Frank Harrell's book Regression Modeling Strategies has material on such techniques. You may also post further questions on CrossValidated.com if you need help choosing which route to go.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM