So I made a linear regression in R Studio to predict the price of a car based on the year of fabrication. The data set is called "audi" and my linear regression looks like this:
library(tidyverse)
library(modelr)
...
model_price_Year <- lm(data = audi, price ~ year)
summary(model_price_Year)
The result of the summary is this:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.437e+06 8.503e+04 -75.71 <2e-16
year 3.203e+03 4.215e+01 75.98 <2e-16
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9437 on 10666 degrees of freedom
Multiple R-squared: 0.3512, Adjusted R-squared: 0.3511
F-statistic: 5772 on 1 and 10666 DF, p-value: < 2.2e-16
Then, I made a grid and i added predictions for 100 values of the year. It looks like this:
grid_year <- audi %>%
data_grid(year = seq_range(year, 100)) %>%
add_predictions(model_price_Year, "price")
And after that, if i want to see results, they look like this:
year price
<dbl> <dbl>
1 1997 -41481.
2 1997. -40737.
3 1997. -39993.
4 1998. -39249.
5 1998. -38505.
6 1998. -37761.
7 1998. -37017.
8 1999. -36273.
9 1999. -35529.
10 1999. -34785.
They are all negative, and becuase we are talking about the price, it doesnt really make sense. Why are they negative? How do I interpret this?
See your data!
If you plot price against year you will see that there is no reason to believe a straight line models that relation. I am saying straight line because if you take logarithms of price the regression will still be linear.
suppressPackageStartupMessages({
library(tidyverse)
library(modelr)
})
model_price_Year <- lm(price ~ year, data = audi)
grid_year <- audi %>%
data_grid(
year = seq_range(year, 100),
.model = model_price_Year
) %>%
add_predictions(model_price_Year, "price")
plot(price ~ year, data = audi)
lines(price ~ year, data = grid_year, col = "red", lwd = 2)
Created on 2022-05-09 by the reprex package (v2.0.1)
The red line above will have negative values within the years range.
The solution seems to be to regress log(price) ~ year
.
After fitting this model I will plot the fitted line twice, against the log transformation of price and in the original scale.
model_price_Year_2 <- lm(log(price) ~ year, data = audi)
grid_year_2 <- audi %>%
data_grid(
year = seq_range(year, 100),
.model = model_price_Year_2
) %>%
add_predictions(model_price_Year_2, "log_price")
plot(log(price) ~ year, data = audi)
lines(log_price ~ year, data = grid_year_2, col = "red", lwd = 2)
plot(price ~ year, data = audi)
lines(exp(log_price) ~ year, data = grid_year_2, col = "red", lwd = 2)
Created on 2022-05-09 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.