简体   繁体   中英

Logistic regression for non-linear data

I have a data with continuous independent variable and binary dependent. Therefore I was trying to apply logistic regression for the analysis of this data. However in contrast to the classical case with S-shaped transition, I have a two transitions. Here is an example of what I mean

library(ggplot)
library(visreg)

classic.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
                          y = c(rep(0, times = 14), 1, 0, rep(1, times = 14)))

model.classic = glm(formula = y ~ x,
                    data = classic.data,
                    family = "binomial")

summary(model.classic)

visreg(model.classic,
       partial = FALSE,
       scale = "response",
       alpha = 0)

经典资料

my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
                     y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))

model.my = glm(formula = y ~ x,
                    data = my.data,
                    family = "binomial")

summary(model.my)

visreg(model.my,
       partial = FALSE,
       scale = "response",
       alpha = 0)

我的资料

The blue lines on both plots - it is outcome of glm, while red line it what I want to have. Is there any way to apply logistic regression to such data? Or should I apply some other type of regression analysis?

In your second model, y is not a linear function of x . When you write y ~ x you assume that when x increases, y will increase/decrease depending on a positive/negative coefficient. That is not the case, it's increasing and then decreasing, making the average effect of x zero (hence the strait line). You therefore need a non-linear function. You could do that with a gam from the mgcv package, where the effect of x is modelled as a smooth function:

library(mgcv)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
                     y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))

m = gam(y ~ s(x), data = my.data, family = binomial)
plot(m)

在此处输入图片说明

That would lead to the following fit on the original scale:

my.data$prediction = predict(m, type = "response")
plot(my.data$x, my.data$y)
lines(my.data$x, my.data$prediction, col = "red")

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM