简体   繁体   English

R中的RMSE(均方根偏差)计算

[英]RMSE (root mean square deviation) calculation in R

I have numeric feature observations V1 through V12 taken for a target variable Wavelength .我对目标变量Wavelength进行了数字特征观察V1V12 I would like to calculate the RMSE between the Vx columns.我想计算Vx列之间的 RMSE。 Data format is below.数据格式如下。

Each variable "Vx" is measured at a 5-minute interval.每个变量“Vx”以 5 分钟的间隔进行测量。 I would like to calculate the RMSE between the observations of all Vx variables, how do I do that?我想计算所有 Vx 变量的观测值之间的 RMSE,我该怎么做?

我对波长变量有不同的观察,每个变量,Vx 以 5 分钟为间隔测量,

This is a link I found, but I'm not sure how I can get y_pred: https://www.kaggle.com/wiki/RootMeanSquaredError这是我找到的链接,但我不确定如何获得 y_pred: https ://www.kaggle.com/wiki/RootMeanSquaredError

For the link provided below, I don't think I have the predicted values: http://heuristically.wordpress.com/2013/07/12/calculate-rmse-and-mae-in-r-and-sas/对于下面提供的链接,我认为我没有预测值: http : //heuristically.wordpress.com/2013/07/12/calculate-rmse-and-mae-in-r-and-sas/

The function below will give you the RMSE:下面的函数将为您提供 RMSE:

RMSE = function(m, o){
  sqrt(mean((m - o)^2))
}

m is for model (fitted) values, o is for observed (true) values. m用于模型(拟合)值, o用于观察(真实)值。

For your help, just wrote these functions:为了您的帮助,只需编写以下函数:

#Fit a model
fit <- lm(Fertility ~ . , data = swiss)

# Function for Root Mean Squared Error
RMSE <- function(error) { sqrt(mean(error^2)) }
RMSE(fit$residuals)

# If you want, say, MAE, you can do the following:

# Function for Mean Absolute Error
mae <- function(error) { mean(abs(error)) }
mae(fit$residuals)

I hope it helps.我希望它有帮助。

How to perform a RMSE in R.如何在 R 中执行 RMSE。

See my other 97+ up voted canonical answer for doing RMSE in Python: https://stackoverflow.com/a/37861832/445131 Below I explain it it terms of R code.请参阅我在 Python 中执行 RMSE 的其他 97+ 投票规范答案: https : //stackoverflow.com/a/37861832/445131下面我将解释它的 R 代码条款。

RMSE: (Root mean squared error), MSE: (Mean Squared Error) and RMS: (Root Mean Squared) are all mathematical tricks to get a feel for change over time between two lists of numbers. RMSE:(均方根误差)、MSE:(均方误差)和 RMS:(均方根)都是用于感受两个数字列表之间随时间变化的数学技巧。

RMSE provides a single number that answers the question: "How similar, on average, are the numbers in list1 to list2?". RMSE 提供了一个数字来回答这个问题:“list1 和 list2 中的数字平均有多相似?”。 The two lists must be the same size.两个列表的大小必须相同。 I want to "wash out noise between any two given elements, wash out the size of the data collected, and get a single number feel for change over time".我想“清除任何两个给定元素之间的噪音,清除收集到的数据的大小,并获得随时间变化的单一数字”。

Intuition and ELI5 for RMSE: RMSE 的直觉和 ELI5:

Imagine you are learning to throw darts at a dart board.想象一下,您正在学习向飞镖板投掷飞镖。 Every day you practice for one hour.每天练习一小时。 You want to figure out if you are getting better or getting worse.你想弄清楚你是在变好还是在变坏。 So every day you make 10 throws and measure the distance between the bullseye and where your dart hit.所以每天你投掷 10 次并测量靶心和飞镖击中位置之间的距离。

You make a list of those numbers.你列出这些数字。 Use the root mean squared error between the distances at day 1 and a list containing all zeros.使用第 1 天的距离与包含全零的列表之间的均方根误差。 Do the same on the 2nd and nth days.在第 2 天和第 n 天做同样的事情。 What you will get is a single number that hopefully decreases over time.你会得到一个单一的数字,它有望随着时间的推移而减少。 When your RMSE number is zero, you hit bullseyes every time.当您的 RMSE 数为零时,您每次都会遇到靶心。 If the number goes up, you are getting worse.如果这个数字上升,你会变得更糟。

Example in calculating root mean squared error in R:在 R 中计算均方根误差的示例:

cat("Inputs are:\n") 
d = c(0.000, 0.166, 0.333) 
p = c(0.000, 0.254, 0.998) 
cat("d is: ", toString(d), "\n") 
cat("p is: ", toString(p), "\n") 

rmse = function(predictions, targets){ 
  cat("===RMSE readout of intermediate steps:===\n") 
  cat("the errors: (predictions - targets) is: ", 
      toString(predictions - targets), '\n') 
  cat("the squares: (predictions - targets) ** 2 is: ", 
      toString((predictions - targets) ** 2), '\n') 
  cat("the means: (mean((predictions - targets) ** 2)) is: ", 
      toString(mean((predictions - targets) ** 2)), '\n') 
  cat("the square root: (sqrt(mean((predictions - targets) ** 2))) is: ", 
      toString(sqrt(mean((predictions - targets) ** 2))), '\n') 
  return(sqrt(mean((predictions - targets) ** 2))) 
} 
cat("final answer rmse: ", rmse(d, p), "\n") 

Which prints:哪个打印:

Inputs are:
d is:  0, 0.166, 0.333 
p is:  0, 0.254, 0.998 
===RMSE Explanation of steps:===
the errors: (predictions - targets) is:  0, -0.088, -0.665 
the squares: (predictions - targets) ** 2 is:  0, 0.007744, 0.442225 
the means: (mean((predictions - targets) ** 2)) is:  0.149989666666667 
the square root: (sqrt(mean((predictions - targets) ** 2))) is:  0.387284994115014 
final answer rmse:  0.387285 

The mathematical notation:数学符号:

R中的RMSE解释

RMSE isn't the most accurate line fitting strategy, total least squares is: RMSE 不是最准确的线拟合策略,总最小二乘法是:

Root mean squared error measures the vertical distance between the point and the line, so if your data is shaped like a banana, flat near the bottom and steep near the top, then the RMSE will report greater distances to points high, but short distances to points low when in fact the distances are equivalent.均方根误差测量点和线之间的垂直距离,因此如果您的数据形状像香蕉,底部附近平坦,顶部附近陡峭,那么 RMSE 将报告到高点的距离更大,但到点的距离很短点低,而实际上距离相等。 This causes a skew where the line prefers to be closer to points high than low.这会导致线条更接近高点而不是低点的偏斜。

If this is a problem the total least squares method fixes this: https://mubaris.com/posts/linear-regression/如果这是一个问题,总最小二乘法可以解决这个问题: https : //mubaris.com/posts/linear-regression/

Gotchas that can break this RMSE function:可以破坏这个 RMSE 函数的问题:

If there are nulls or infinity in either input list, then output rmse value is is going to not make sense.如果任一输入列表中存在空值或无穷大,则输出 rmse 值将没有意义。 There are three strategies to deal with nulls / missing values / infinities in either list: Ignore that component, zero it out or add a best guess or a uniform random noise to all timesteps.有三种策略可以处理任一列表中的空值/缺失值/无穷大:忽略该组件、将其归零或向所有时间步长添加最佳猜测或均匀随机噪声。 Each remedy has its pros and cons depending on what your data means.每种补救措施都有其优缺点,具体取决于您的数据含义。 In general ignoring any component with a missing value is preferred, but this biases the RMSE toward zero making you think performance has improved when it really hasn't.一般来说,忽略任何具有缺失值的组件是首选,但这会使 RMSE 偏向于零,使您认为性能已经提高,而实际上并没有。 Adding random noise on a best guess could be preferred if there are lots of missing values.如果存在大量缺失值,则最好在最佳猜测上添加随机噪声。

In order to guarantee relative correctness of the RMSE output, you must eliminate all nulls/infinites from the input.为了保证 RMSE 输出的相对正确性,您必须从输入中消除所有空值/无穷大。

RMSE has zero tolerance for outlier data points which don't belong RMSE 对不属于的离群数据点零容忍

Root mean squared error squares relies on all data being right and all are counted as equal.均方根误差平方依赖于所有数据都是正确的,并且所有数据都被视为相等。 That means one stray point that's way out in left field is going to totally ruin the whole calculation.这意味着左场的一个偏离点将完全破坏整个计算。 To handle outlier data points and dismiss their tremendous influence after a certain threshold, see Robust estimators that build in a threshold for dismissal of outliers.要处理异常值数据点并在达到某个阈值后消除它们的巨大影响,请参阅建立用于消除异常值的阈值的稳健估计器。

You can either write your own function or use the the package hydroGOF, which also has a RMSE function.您可以编写自己的函数,也可以使用 HydroGOF 包,它也有一个 RMSE 函数。 http://www.rforge.net/doc/packages/hydroGOF/rmse.html http://www.rforge.net/doc/packages/hydroGOF/rmse.html

Regarding your y_pred you first need a model which produced them, otherwise why would you want to calculate RMSE?关于您的 y_pred,您首先需要一个生成它们的模型,否则您为什么要计算 RMSE?

You can also use library(mltools) in R, which has method你也可以在 R 中使用 library(mltools),它有方法

rmse(preds = NULL, actuals = NULL, weights = 1, na.rm = FALSE) rmse(预测 = NULL,实际值 = NULL,权重 = 1,na.rm = FALSE)

Reference: http://search.r-project.org/library/mltools/html/rmse.html参考: http : //search.r-project.org/library/mltools/html/rmse.html

You could also use summary() for your linear model:您还可以将 summary() 用于您的线性模型:

mod = lm(dependent ~ independent, data) then: mod = lm(dependent ~ independent, data)然后:

mod.error = summary(mod)
mod.error$sigma

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM