简体   繁体   中英

weight issues in scikit-learn's adaboost

I'm trying to use AdaBoostClassifier with a decision tree stump as the base classifier. I noticed that the weight adjustment done by AdaBoostClassifier has been giving me errors both for SAMME.R and SAMME options.

Here's a brief overview of what I'm doing

def train_adaboost(features, labels):
    uniqLabels = np.unique(labels)
    allLearners = []
    for targetLab in uniqLabels:
        runs=[]
        for rrr in xrange(10):
            feats,labs = get_binary_sets(features, labels, targetLab)
            baseClf = DecisionTreeClassifier(max_depth=1, min_samples_leaf=1)
            baseClf.fit(feats, labs)

            ada_real = AdaBoostClassifier( base_estimator=baseClf, 
                                           learning_rate=1, 
                                           n_estimators=20, 
                                           algorithm="SAMME")
            runs.append(ada_real.fit(feats, labs))
        allLearners.append(runs)

    return allLearners

I looked at the fit for every single decision tree classifier and they are able to predict some labels. When I look at the AdaBoostClassifier using this base classifier, however, I get errors about the weight boosting algorithm.

def compute_confidence(allLearners, dada, labbo):
    for ii,thisLab in enumerate(allLearners):
        for jj, thisLearner in enumerate(thisLab):
            #accessing thisLearner's methods here 

The methods give errors like these:

ipdb> thisLearner.predict_proba(myData)

PATHTOPACKAGE/lib/python2.7/site-packages/sklearn/ensemble/weight_boosting.py:727: RuntimeWarning: invalid value encountered in double_scalars proba /= self.estimator_weights_.sum() *** ValueError: 'axis' entry is out of bounds

ipdb> thisLearner.predict(myData)

PATHTOPACKAGE/lib/python2.7/site-packages/sklearn/ensemble/weight_boosting.py:639: RuntimeWarning: invalid value encountered in double_scalars pred /= self.estimator_weights_.sum() *** IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index

I tried SAMME.R algorithm for adaboost but I can't even fit adaboost in that case because of this error [...]

File "PATH/sklearn/ensemble/weight_boosting.py", line 388, in fit return super(AdaBoostClassifier, self).fit(X, y, sample_weight)

File "PATH/sklearn/ensemble/weight_boosting.py", line 124, in fit X_argsorted=X_argsorted)

File "PATH/sklearn/ensemble/weight_boosting.py", line 435, in _boost X_argsorted=X_argsorted)

File "PATH/sklearn/ensemble/weight_boosting.py", line 498, in _boost_real (estimator_weight < 0)))

ValueError: non-broadcastable output operand with shape (1000) doesn't match the broadcast shape (1000,1000)

the data's dimensions are actually compatible with the format that classifier is expecting, both before using adaboost and when I try to test the trained classifiers. What can these errors indicate?

Well that was a little counterintuitive, coming from coding in Matlab.

Apparently the labs' dimensions was the issue, which is (1000,1). It needed to be (1000,)

Adding this line solved it:

labs = labs[:,0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM