简体   繁体   English

H2O 的 RMSE 性能报告不一致

[英]H2O's RMSE performance report not consistent

I'm wondering why h2o.performance report is different from standard definition of rmse on the test data.我想知道为什么h2o.performance报告与测试数据上rmse的标准定义不同。 h2o 's performance report seems to overstating. h2o的业绩报告似乎夸大了。

Below is a reprex.下面是一个代表。


iris_h2o = as.h2o(iris)
parts = h2o.splitFrame(iris_h2o, ratios = c(0.5,0.25), seed = 1)
train = parts[[1]]
valid = parts[[2]]
test = parts[[3]]

x = c('Sepal.Width','Petal.Length','Petal.Width')
y = 'Sepal.Length'
auto_gbm = h2o.automl(x= x,
                      y= y,
                      training_frame = train,
                      validation_frame = valid,
                      nfolds = 0,
                      include_algos = c('GBM'),
                      max_models = 5,
                      seed = 1
                      )
best_gbm = h2o.get_best_model(auto_gbm)
 
h2o.performance(best_gbm, test)

Above performance result is以上性能结果为

H2ORegressionMetrics: gbm

MSE:  0.1152907
RMSE:  0.3395449
MAE:  0.2675279
RMSLE:  0.04744378
Mean Residual Deviance :  0.1152907

However, when I generate prediction on test dataset and calculate RMSE manually, the value diverges a lot.但是,当我在测试数据集上生成预测并手动计算RMSE时,值差异很大。

rmse = function(y, y_predict){
  N = length(y)
  RMSE = sqrt(sum((y-y_predict)^2,na.rm=T)/N)
  return(RMSE)
}

test['predicted'] = h2o.predict(best_gbm, test)

rmse(test['Sepal.Length'], test['predicted'])

[1] 1.890506

H2O's performance report on RMSE: 0.33 H2O 的 RMSE 性能报告:0.33

Manual calculation on RMSE: 1.89手动计算 RMSE:1.89

which is more than 5 times bigger.这是5倍以上。 Why am I seeing this inconsistency?为什么我会看到这种不一致?

H2O cluster version:        3.36.1.4 

You have a mistake in your rmse function.您的 rmse function 有错误。 The return of length(y) is not returning what you think it does. length(y)的返回并没有返回您认为的返回值。 You should use nrow to get the number of rows.您应该使用nrow来获取行数。 You can check this with length(test['Sepal.Length']) , which will return 1 and not 31 as you expect.您可以使用length(test['Sepal.Length'])进行检查,它将返回 1 而不是 31 如您所期望的。 Your function should be like this:你的 function 应该是这样的:

rmse = function(y, y_predict){
  N = nrow(y)
  RMSE = sqrt(sum((y-y_predict)^2,na.rm=T)/N)
  return(RMSE)
}

rmse(test['Sepal.Length'], test['predicted'])
[1] 0.3395448

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM