Bug in Scikit-Learn GradientBoostingClassifier?

Question

I am running a GradientBoostingClassifier from Sklearn and I am getting some strange output from the verbose output. I am taking random 10% samples from my entire dataset and most seem to be fine but sometimes I get strange output and poor results. Can someone please explain what is going on?

"Good" result:

n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.01, loss='deviance', max_depth=4,
              max_features=None, max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=2000, presort='auto', random_state=None,
              subsample=1.0, verbose=1, warm_start=False)
      Iter       Train Loss   Remaining Time 
         1           0.6427           40.74m
         2           0.6373           40.51m
         3           0.6322           40.34m
         4           0.6275           40.33m
         5           0.6230           40.31m
         6           0.6187           40.18m
         7           0.6146           40.34m
         8           0.6108           40.42m
         9           0.6071           40.43m
        10           0.6035           40.28m
        20           0.5743           40.12m
        30           0.5531           39.74m
        40           0.5367           39.49m
        50           0.5237           39.13m
        60           0.5130           38.78m
        70           0.5041           38.47m
        80           0.4963           38.34m
        90           0.4898           38.22m
       100           0.4839           38.14m
       200           0.4510           37.07m
       300           0.4357           35.49m
       400           0.4270           33.87m
       500           0.4212           31.77m
       600           0.4158           29.82m
       700           0.4108           27.74m
       800           0.4065           25.69m
       900           0.4025           23.55m
      1000           0.3987           21.39m
      2000           0.3697            0.00s
predicting
this_file_MCC = 0.5777

"Bad" result:

Training the classifier
n features = 168
GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=1.0, loss='deviance', max_depth=5,
              max_features='sqrt', max_leaf_nodes=None,
              min_impurity_split=1e-07, min_samples_leaf=1,
              min_samples_split=2, min_weight_fraction_leaf=0.0,
              n_estimators=500, presort='auto', random_state=None,
              subsample=1.0, verbose=1, warm_start=False)
      Iter       Train Loss   Remaining Time 
         1           0.5542            1.07m
         2           0.5299            1.18m
         3           0.5016            1.14m
         4           0.4934            1.16m
         5           0.4864            1.19m
         6           0.4756            1.21m
         7           0.4699            1.24m
         8           0.4656            1.26m
         9           0.4619            1.24m
        10           0.4572            1.26m
        20           0.4244            1.27m
        30           0.4063            1.24m
        40           0.3856            1.20m
        50           0.3711            1.18m
        60           0.3578            1.13m
        70           0.3407            1.10m
        80           0.3264            1.09m
        90           0.3155            1.06m
       100           0.3436            1.04m
       200           0.3516           46.55s
       300        1605.5140           29.64s
       400 52215150662014.0469           13.70s
       500 585408988869401440279216573629431147797247696359586211550088082222979417986203510562624281874357206861232303015821113689812886779519405981626661580487933040706291550387961400555272759265345847455837036753780625546140668331728366820653710052494883825953955918423887242778169872049367771382892462080.0000            0.00s
predicting
this_file_MCC = 0.0398

Answer 1

Your learning rate for the "bad" example is too high and you are jumping over the local or global minima in the gradient descent step of the the Gradient Boosting algorithm. This causes a divergence situation and results in the explosion of the error you are seeing. Take a look at this lecture from Andrew Ng's Machine Learning Course . The pertinent part to learning rate comes in at about 4:30.

Think about Gradient Descent/Ascent as the process of trying to find your way to the bottom or top of a valley/hill or ideally to the lowest/tallest point globally. If the hills/valleys are very large and you take tiny steps you should eventually be able to find your way to at least the local minima/maxima. But if the hills/valleys are small in proportion to the size of your steps it would be very easy to leap across maxima/minima and end up somewhere awful. The learning rate represents the size of your steps, in the "good" case your learning rate (alpha) was 0.01, so you were able to take tiny steps in the right direction (for the most part) until you reach the minima, but in the "bad" case your alpha was 1.0 so you were taking large steps and jumped right over the local minima and ended up ascending instead of descending. Thats a really elementary way of thinking about what the learning rate is doing within the algorithm.

If you read this article on tuning the learning rate on DatumBox you will see an oft recycled visualization of the process (not sure who stole this image from whom, but it's everywhere) and a little discussion of adaptively changing the learning rate. Not sure if this is the default in sklearn but I wouldn't count on it.

Bug in Scikit-Learn GradientBoostingClassifier?

Question

1 answers

solution1
1 2017-04-14 15:02:15

Bug in Scikit-Learn GradientBoostingClassifier?

Question

1 answers

solution1 1 2017-04-14 15:02:15

solution1
1 2017-04-14 15:02:15