简体   繁体   中英

Comparing a linear regression with a log-linear regression with R

I have a model in R where I have regressed the price of a Honda civic on its mileage:

civic <- read.csv("civic.csv")
c <- civic

plot (c$Mileage, c$Price,
      xlab = "Mileage",
      ylab = "Price")

regrPM1 <- lm(Price~Mileage, data = c)

abline (regrPM1, col="red",lwd=3)

This gives me the following:

情节1

So far so good. Now I have another model:

regrPM2 <- lm(log(c$Price)~c$Mileage)

And I want to add the corrosponding regression line into the Plot1 from above. When I use the abline command:

abline(regrPM2, col="green", lwd=3)

It results in the following plot:

情节2

Now this can't be used to compare the two models. I am looking for a way to compare them without using a 'log' scale. I think, I could use the curve demand to get better results but that did not worked out yet.

Thankful for any kind of help!

It's not a straight line on the original scale. You could do something along the lines of the following to show the non-linear prediction on the original scale.

DF <- data.frame(Mileage=seq(1, 150000, 1))
pred <- predict(regrPM2, newdata=DF)
lines(DF$Mileage, exp(pred))

This must be run after you create the initial plot with plot()

It's difficult to demonstrate what's wrong here without data, so I'll try to create some that's roughly similar to yours:

set.seed(69)

m <- rgamma(5000, 2, 2) * 30000
p <- 3e4 * log((rnorm(5e3, 1e4, 1e3) + m)/(m + rnorm(5e3, 5e3, 5e2)) + rgamma(5000, 2, 2)/8)

c <- data.frame(Mileage = m, Price = p)

plot (c$Mileage, c$Price,
      xlab = "Mileage",
      ylab = "Price")

在此处输入图像描述

This is close enough for demonstration purposes.

Now we can add the linear regression line using your code:

regrPM1 <- lm(Price~Mileage, data = c)

abline (regrPM1, col="red",lwd=3)

在此处输入图像描述

Now, if we regress the log of the price on the mileage, we will get the same flat green line as you did if we just plot the result using abline :

regrPM2 <- lm(log(c$Price)~c$Mileage)
abline(regrPM2, col="green", lwd=3)

在此处输入图像描述

That's because we are plotting the log of the price on the (non-logged) plot. We want to take the anti-log of the result of our regression and plot that.

Note that it's better to use the data argument in our lm call, so let's do:

regrPM3 <- lm(log(Price) ~ Mileage, data = c)

Now instead of trying to plot this as a straight line, let's take the anti-log of its predictions at fixed intervals and plot them:

lines(seq(0, 2e5, 1e3), 
      exp(predict(regrPM3, newdata = list(Mileage = seq(0, 2e5, 1e3)))),
      col = "blue", lty = 2, lwd = 4)

在此处输入图像描述

So the blue dotted line is what the log regression looks like.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM