简体   繁体   中英

Piecewise regression with R: plotting the segments

I have 54 points. They represent offer and demand for products. I would like to show there is a break point in the offer.

First, I sort the x-axis (offer) and remove the values that appears twice. I have 47 values, but I remove the first and last ones (doesn't make sense to consider them as break points). Break is of length 45:

Break<-(sort(unique(offer))[2:46])

Then, for each of these potential break points, I estimate a model and I keep in "d" the residual standard error (sixth element in model summary object).

d<-numeric(45)
for (i in 1:45) {
model<-lm(demand~(offer<Break[i])*offer + (offer>=Break[i])*offer)
d[i]<-summary(model)[[6]] }

Plotting d, I notice that my smaller residual standard error is 34, that corresponds to "Break[34]": 22.4. So I write my model with my final break point:

model<-lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer)

Finally, I'm happy with my new model. It's significantly better than the simple linear one. And I want to draw it:

plot(demand~offer)
i <- order(offer)
lines(offer[i], predict(model,list(offer))[i])

But I have a warning message:

Warning message:
In predict.lm(model, list(offer)) :
  prediction from a rank-deficient fit may be misleading

And more important, the lines are really strange on my plot.

我的情节与所谓的两个部分,但没有加入

Here are my data:

demand <- c(1155, 362, 357, 111, 703, 494, 410, 63, 616, 468, 973, 235,
            180, 69, 305, 106, 155, 422, 44, 1008, 225, 321, 1001, 531, 143,
            251, 216, 57, 146, 226, 169, 32, 75, 102, 4, 68, 102, 462, 295,
            196, 50, 739, 287, 226, 706, 127, 85, 234, 153, 4, 373, 54, 81,
            18)
offer <- c(39.3, 23.5, 22.4, 6.1, 35.9, 35.5, 23.2, 9.1, 27.5, 28.6, 41.3,
           16.9, 18.2, 9, 28.6, 12.7, 11.8, 27.9, 21.6, 45.9, 11.4, 16.6,
           40.7, 22.4, 17.4, 14.3, 14.6, 6.6, 10.6, 14.3, 3.4, 5.1, 4.1,
           4.1, 1.7, 7.5, 7.8, 22.6, 8.6, 7.7, 7.8, 34.7, 15.6, 18.5, 35,
           16.5, 11.3, 7.7, 14.8, 2, 12.4, 9.2, 11.8, 3.9)

Here is an easier approach using ggplot2 .

require(ggplot2)
qplot(offer, demand, group = offer > 22.4, geom = c('point', 'smooth'), 
   method = 'lm', se = F, data = dat)

EDIT. I would also recommend taking a look at this package segmented which supports automatic detection and estimation of segmented regression models.

在此输入图像描述

UPDATE:

Here is an example that makes use of the R package segmented to automatically detect the breaks

library(segmented)
set.seed(12)
xx <- 1:100
zz <- runif(100)
yy <- 2 + 1.5*pmax(xx - 35, 0) - 1.5*pmax(xx - 70, 0) + 15*pmax(zz - .5, 0) + 
  rnorm(100,0,2)
dati <- data.frame(x = xx, y = yy, z = zz)
out.lm <- lm(y ~ x, data = dati)
o <- segmented(out.lm, seg.Z = ~x, psi = list(x = c(30,60)),
  control = seg.control(display = FALSE)
)
dat2 = data.frame(x = xx, y = broken.line(o)$fit)

library(ggplot2)
ggplot(dati, aes(x = x, y = y)) +
  geom_point() +
  geom_line(data = dat2, color = 'blue')

分段

Vincent has you on the right track. The only thing "weird" about the lines in your resulting plot is that lines draws a line between each successive point, which means that "jump" you see if it simply connecting the two ends of each line.

If you don't want that connector, you have to split the lines call into two separate pieces.

Also, I feel like you can simplify your regression a bit. Here's what I did:

#After reading your data into dat
Break <- 22.4
dat$grp <- dat$offer < Break

#Note the addition of the grp variable makes this a bit easier to read
m <- lm(demand~offer*grp,data = dat)
dat$pred <- predict(m)

plot(dat$offer,dat$demand)
dat <- dat[order(dat$offer),]
with(subset(dat,offer < Break),lines(offer,pred))
with(subset(dat,offer >= Break),lines(offer,pred))

which produces this plot:

在此输入图像描述

The weird lines are simply due to the order in which the points are plotted. The following should look better:

i <- order(offer)
lines(offer[i], predict(model,list(offer))[i])

The warning comes from the fact that the * character is interpreted by lm .

> lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer)
Call:
lm(formula = demand ~ (offer < 22.4) * offer + (offer >= 22.4) * offer)
Coefficients:
            (Intercept)         offer < 22.4TRUE                    offer  
                -309.46                   356.08                    29.86  
      offer >= 22.4TRUE   offer < 22.4TRUE:offer  offer:offer >= 22.4TRUE  
                     NA                   -20.79                       NA  

In addition, (offer<22.4)*offer is a discontinuous function: this is where the discontinuity comes from.

The following should be closer to what you want.

model <- lm(
  demand ~ ifelse(offer<22.4,offer-22.4,0) 
           + ifelse(offer>=22.4,offer-22.4,0) )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM