简体   繁体   中英

What is the role of `train=True` in H2O model_performance()?

At first I thought model_performance(train=True) gives the performance result of predicting on the same data that we trained the model. But this is not the case, because the number must have been the same as model.model_performance(test_data=train) , but it isn't.

Consider the following toy example:

# Make a dataframe
df = h2o.H2OFrame({'a':list(range(100)), 'b':list(range(100, 0, -1)), 'c':list(range(0, 200, 2))})

# Split the data
train, val, test = df.split_frame([.6, .2], seed=0)

# Build a model
from h2o.estimators.random_forest import H2ORandomForestEstimator
model = H2ORandomForestEstimator(seed=0)

# Train the model
model.train(x=train.names[:-1], y=train.names[-1], training_frame=train, validation_frame=val)

# Get performance results
print(model.model_performance(train=True)['mae'] 
      , model.model_performance(valid=True)['mae']
      , model.model_performance(test_data=test)['mae']
     )
# 1.3816 1.1968 1.4722

Compare the results with

print(model.model_performance(test_data=train)['mae'] 
      , model.model_performance(test_data=val)['mae']
      , model.model_performance(test_data=test)['mae']
     )
# 0.5548 1.1968 1.4722

Note that the result of model_performance(train=True) and model_performance(test_data=train) are different, but the result of model_performance(valid=True) and model_performance(test_data=val) are the same.

So I'm wondering whether model_performance(train=True) and model.model_performance(test_data=train) should be the same (and there is a mistake in the calculation in H2O code), or the purpose of model_performance(train=True) is something else.

In the docs it says

train: boolean, optional
Report the training metrics for the model.
valid: boolean, optional
Report the validation metrics for the model.

But this is not very clear, given the above mentioned facts.

train=True显示训练结束时模型的性能,这意味着它返回训练期间构造的训练度量,而test_data = traintrain数据发送到模型进行预测,并检查该预测的模型性能。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM