简体   繁体   中英

Questions on ensemble technique in machine learning

I am studying the ensemble machine learning and when I read some articles online, I encountered 2 questions.

1.

In this article , it mentions

Instead, model 2 may have a better overall performance on all the data points, but it has worse performance on the very set of points where model 1 is better. The idea is to combine these two models where they perform the best. This is why creating out-of-sample predictions have a higher chance of capturing distinct regions where each model performs the best.

在此处输入图片说明

But I still cannot get the point, why not train all training data can avoid the problem?

2.

From this article , in the prediction section, it mentions

Simply, for a given input data point, all we need to do is to pass it through the M base-learners and get M number of predictions, and send those M predictions through the meta-learner as inputs

But in the training process, we use k -fold train data to train M base-learner, so should I also train M base-learner based on all train data for the input to predict?

Assume red and blue were the best models you could find.

One works better in region 1, the other on region 2.

Now you would also train a classifier to predict which model to use, ie, you would try to learn the two regions.

Do the validation on the outside. You can overfit if you give the two inner models access to data that the meta model does not see.

The idea in ensembles is that a group of weak predictors outperform a strong predictor. So, if we train different models with different predictive results and use the majority rule as the final result of our ensemble, this result is better than just trying to train one single model. Assume, for example, that the data consist of two distinct patterns, one linear and one quadratic. Then using a single classifier can either overfit or produce inaccurate results. You can read this tutorial to learn more about ensembles and bagging and boosting.

1) "But I still cannot get the point, why not train all training data can avoid the problem?" - We will hold that data for validation purpose, just like the way we do in K-fold

2) "so should I also train M base-learner based on all train data for the input to predict?" - If you give same data to all the learners then the output of all of them would be same and there is no use in creating them. So we will give a subset of data to each learner.

For question 1 I will prove why we train two models in a contradictory way. Suppose you train a model with all the data points.During training whenever the model will see a data point belonging to the red class, then it will try to fit itself so that it can classify red points with minimal error.Same is true for data points belonging to the blue class.Therefore during training the model is leaning towards a specific data point(either red or blue).And at the end model will try to fit itself so that it does not make much mistakes on both the data points and the final model will be an average model. But instead if you train two models for the two different datasets, then each model will be trained on a specific dataset and a model doesn't have to care about data points which belong to another class.

It will be more clearer with the following metaphor. Suppose there are two persons which are specialized to do two completely different jobs.Now when a job comes if you tell them that both of you have to do the job and each of them need to do 50% of the job. Now think what kind of result you will get at the end. Now also think what could be the result if you would tell them that a person should work on only the job at which the person is best.

在问题 2 中,您必须将训练数据集拆分为 M 个数据集。并在训练期间将 M 个数据集提供给 M 个基础学习者。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM