I have two datasets with which I plot using R's lm
command. The first plot below is not centered towards the red line. But the second graphs on the right is centered towards the line.
My questions are:
The code I use to plot that data is simply:
data <-read.table("myfile.txt")
dat1x <- data$x1
dat1y <- data$y1
# plot left figure
dat1_lm <- lm(dat1x ~ dat1y)
plot(dat1x ~ dat1y)
abline(coef(dat1_lm),col="red")
dat1_lm.r2 <- summary(dat1_lm)$adj.r.squared;
# repeat the the same for right figure
dat2x <- data$x2
dat2y <- data$y2
dat2_lm <- lm(dat2x ~ dat2y)
plot(dat2x ~ dat2y)
abline(coef(dat2_lm),col="red")
dat2_lm.r2 <- summary(dat2_lm)$adj.r.squared;
Update Plot with RMSE Score:
I am looking for a score that shows right figure is better than the left based on data centering towards the prediction line.
The R-squared gives the goodness of fit of the line, ie the percentage of variation in the dataset that is explained by the linear model. Another way of explaining the R-squared is how much better does the model perform than the mean model. The p-values gives the significance of the fit., ie is the coefficient of the linear model significantly different from zero.
To extract these values:
dat = data.frame(a = runif(100), b = runif(100))
lm_obj = lm(a~b, dat)
rsq = summary(lm_obj)[["r.squared"]]
p_value = summary(lm_obj)[["coefficients"]]["b","Pr(>|t|)"]
Alternatively, you could calculate the RMSE between the observations and the outcome of the linear model:
rmse = sqrt(mean((dat$a - predict(lm_obj))^2))
Note that this is the RMSE of a
and the linear model. If you want the RMSE of a
and b
:
rmse = sqrt(mean((dat$a - dat$b)^2))
What you might be looking for is MAPE (Mean absolute percentage error). Its advantages over other measures of accuracy (MSE, MPE, RMSE, MAE, etc.) is that MAPE does not depend on levels, it measures absolute errors and it has a clear meaning. You could use a package forecast
to get some of these measures:
library(forecast)
data <- data.frame(y = rnorm(100), x = rnorm(100))
model <- lm(y ~ x, data)
accuracy(model)
# ME RMSE MAE MPE MAPE
# 5.455773e-18 1.019446e+00 7.957585e-01 1.198441e+02 1.205495e+02
accuracy(model)["MAPE"]
# MAPE
# 120.5495
or
mape <- function(f, x) mean(abs(1 - f / x) * 100)
mape(fitted(model), data$y)
# [1] 120.5495
On the other hand, it might look that MPE (Mean percentage error) is better for showing how well data is centered around the prediction line, eg let prediction be p <- rep(2, 20)
and data y <- rep(c(3,1), 10)
, then MPE = 0
but MAPE = 100%
.
So you should decide what you really want to show, MAPE is better as a measure of accuracy, but for you second example MPE might be a better choice.
Update: in case it really is centering what you want to check, you should look at measures that sum errors without any squares, absolute values, etc. That is, you also might want to take a look at ME (Mean error), which is a bit simpler than MPE, but has different interpretation. Here is an example somewhat similar to the first one of yours:
mpe <- function(f, x) mean((1 - f / x) * 100)
mape <- function(f, x) mean(abs(1 - f / x) * 100)
me <- function(f, x) mean(x - f)
set.seed(20130130)
y1 <- rnorm(1000, mean = 10, sd = 1.5) * (1:1000) / 300
y2 <- rnorm(1000, mean = 10, sd = 1.7) * (1:1000) / 250
pr <- (1:1000) / 30
data <- data.frame(y = c(y1, y2),
x = 1:1000,
prediction = rep(pr, 2),
id = rep(1:2, each = 1000))
results <- data.frame(MAPE = c(mape(pr, y1), mape(pr, y2)),
MPE = c(mpe(pr, y1), mpe(pr, y2)),
ME = c(me(pr, y1), me(pr, y2)),
id = 1:2)
results <- round(results, 2)
ggplot(data, aes(x, y)) + geom_line() + theme_bw() +
facet_wrap(~ id) + geom_line(aes(y = prediction), colour = "red") +
theme(strip.background = element_blank()) + labs(y = NULL, x = NULL) +
geom_text(data = results, x = 150, y = 50, aes(label = paste("MAPE:", MAPE))) +
geom_text(data = results, x = 150, y = 45, aes(label = paste("MPE:", MPE))) +
geom_text(data = results, x = 150, y = 40, aes(label = paste("ME:", ME)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.