I am running a GradientBoostingClassifier from Sklearn and I am getting some strange output from the verbose output. I am taking random 10% samples from my entire dataset and most seem to be fine but sometimes I get strange output and poor results. Can someone please explain what is going on?
"Good" result:
n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=0.01, loss='deviance', max_depth=4,
max_features=None, max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=2000, presort='auto', random_state=None,
subsample=1.0, verbose=1, warm_start=False)
Iter Train Loss Remaining Time
1 0.6427 40.74m
2 0.6373 40.51m
3 0.6322 40.34m
4 0.6275 40.33m
5 0.6230 40.31m
6 0.6187 40.18m
7 0.6146 40.34m
8 0.6108 40.42m
9 0.6071 40.43m
10 0.6035 40.28m
20 0.5743 40.12m
30 0.5531 39.74m
40 0.5367 39.49m
50 0.5237 39.13m
60 0.5130 38.78m
70 0.5041 38.47m
80 0.4963 38.34m
90 0.4898 38.22m
100 0.4839 38.14m
200 0.4510 37.07m
300 0.4357 35.49m
400 0.4270 33.87m
500 0.4212 31.77m
600 0.4158 29.82m
700 0.4108 27.74m
800 0.4065 25.69m
900 0.4025 23.55m
1000 0.3987 21.39m
2000 0.3697 0.00s
predicting
this_file_MCC = 0.5777
"Bad" result:
Training the classifier
n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
learning_rate=1.0, loss='deviance', max_depth=5,
max_features='sqrt', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=500, presort='auto', random_state=None,
subsample=1.0, verbose=1, warm_start=False)
Iter Train Loss Remaining Time
1 0.5542 1.07m
2 0.5299 1.18m
3 0.5016 1.14m
4 0.4934 1.16m
5 0.4864 1.19m
6 0.4756 1.21m
7 0.4699 1.24m
8 0.4656 1.26m
9 0.4619 1.24m
10 0.4572 1.26m
20 0.4244 1.27m
30 0.4063 1.24m
40 0.3856 1.20m
50 0.3711 1.18m
60 0.3578 1.13m
70 0.3407 1.10m
80 0.3264 1.09m
90 0.3155 1.06m
100 0.3436 1.04m
200 0.3516 46.55s
300 1605.5140 29.64s
400 52215150662014.0469 13.70s
500 585408988869401440279216573629431147797247696359586211550088082222979417986203510562624281874357206861232303015821113689812886779519405981626661580487933040706291550387961400555272759265345847455837036753780625546140668331728366820653710052494883825953955918423887242778169872049367771382892462080.0000 0.00s
predicting
this_file_MCC = 0.0398
Your learning rate for the "bad" example is too high and you are jumping over the local or global minima in the gradient descent step of the the Gradient Boosting algorithm. This causes a divergence situation and results in the explosion of the error you are seeing. Take a look at this lecture from Andrew Ng's Machine Learning Course . The pertinent part to learning rate comes in at about 4:30.
Think about Gradient Descent/Ascent as the process of trying to find your way to the bottom or top of a valley/hill or ideally to the lowest/tallest point globally. If the hills/valleys are very large and you take tiny steps you should eventually be able to find your way to at least the local minima/maxima. But if the hills/valleys are small in proportion to the size of your steps it would be very easy to leap across maxima/minima and end up somewhere awful. The learning rate represents the size of your steps, in the "good" case your learning rate (alpha) was 0.01, so you were able to take tiny steps in the right direction (for the most part) until you reach the minima, but in the "bad" case your alpha was 1.0 so you were taking large steps and jumped right over the local minima and ended up ascending instead of descending. Thats a really elementary way of thinking about what the learning rate is doing within the algorithm.
If you read this article on tuning the learning rate on DatumBox you will see an oft recycled visualization of the process (not sure who stole this image from whom, but it's everywhere) and a little discussion of adaptively changing the learning rate. Not sure if this is the default in sklearn but I wouldn't count on it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.