简体繁体 English

mlr3 中的 CV 或训练/预测

[英]CV or train/predict in mlr3

原文 2021-02-06 15:31:16 3 1 mlr3

In a post "The "Cross-Validation - Train/Predict" misunderstanding" by Patrick Schratz在 Patrick Schratz 的一篇文章“交叉验证 - 训练/预测”的误解中

https://mlr-org.com/docs/cv-vs-predict/ https://mlr-org.com/docs/cv-vs-predict/

mentioned that:提到：

(a) CV is done to get an estimate of a model's performance. (a) CV 用于评估模型的性能。

(b) Train/predict is done to create the final predictions (which your boss might use to make some decisions on). (b) 训练/预测是为了创建最终预测（你的老板可能会用它来做出一些决定）。

It means in mlr3, if we are in academia, need to publish papers, we need to use the CV as we intend to compare the performance of different algorithms.这意味着在mlr3中，如果我们在学术界，需要发表论文，我们需要使用CV，因为我们打算比较不同算法的性能。 And in industry, if our plan is to train a model and then have to use again and again on industry data to make predictions, we need to use the train/predict methods provided by mlr3?而在工业中，如果我们的计划是训练一个model，然后必须一次又一次地使用工业数据进行预测，我们需要使用mlr3提供的训练/预测方法吗？

Is it something which I completely picked wrong?这是我完全选错的东西吗？

Thank you谢谢

1 个解决方案

You always need a CV if you want to make a statement about a model's performance.如果你想对模型的表现做出陈述，你总是需要一份简历。

If you want to use the model to make predictions to unknown data, do a single fit and then predict.如果要使用 model 对未知数据进行预测，请进行单次拟合，然后进行预测。

So in practice, you need both: CV + "train+predict".所以在实践中，你需要两者：CV +“train+predict”。

PS: Your post does not really fit to Stackoverflow since it is not related to a coding problem. PS：您的帖子并不适合 Stackoverflow，因为它与编码问题无关。 For statistical questions please see https://stats.stackexchange.com/ .有关统计问题，请参阅https://stats.stackexchange.com/ 。

PS2: If you talk about a post, please include the link. PS2：如果您谈论帖子，请附上链接。 I am the author of the post in this case but most other people might not know what you are talking about;)在这种情况下，我是该帖子的作者，但大多数其他人可能不知道您在说什么；）