How to chain ML models/pipeline models sequentially?

Question

Premise:
I have been working on this ML dataset and I found my ADA boost and SVM to be extremely good when it comes to detecting TP. The confusion matrix for both models is identical shown below.
Here's the image:

Out of the 10 models I have trained, 2 of them are ADA and SVM. The other 8, some have lower accuracy and others higher by ~+-2%

MAIN QUESTION:

How do I chain/pipeline so that all my test cases are handled in the following manner?

Pass all the cases through SVM and ADA. If the either SVM or ADA has 80%+ confidence return the result
Else, if SVM or ADA don't have a high confidence, have only those test cases evaluated by the other 8 models for a final decision

Potential Solution:
My potential attempt involved the use of 2 voting classifiers. One classifier with just ADA and SVM, the second classifier with the other 8 models. But I don't know hot to make this work

Here's the code for my approach:

from sklearn.ensemble import VotingClassifier
ensemble1=VotingClassifier(estimators=[
                                            ('SVM',model[5]),
                                            ('ADA',model[7]),
                                             ], voting='hard').fit(X_train,Y_train)
print('The accuracy for ensembled model is:',ensemble1.score(X_test, Y_test))


#I was trying to make ensemble 1 the "first pass" if it was more than 80% confident in it's decision, return the result
#ELSE, ensemble 2 jumps in to make a decision


ensemble2=VotingClassifier(estimators=[
                                            ('LR',model[0]),
                                            ('DT',model[1]),
                                            ('RFC',model[2]),
                                            ('KNN',model[3]),
                                            ('GBB',model[4]),
                                            ('MLP',model[6]),
                                            ('EXT',model[8]),
                                            ('XG',model[9])
                                             ], voting='hard').fit(X_train,Y_train)


#I don't know how to make these two models work together though.

Extra Questions:
These questions are to facilitate some extra concerns I had and are NOT the main question:

Is what I am trying to do worth it?
Is it normal to have a Confusion matrix with just True Positives and False Positives? Or is this indicative of incorrect training? As seen above in the picture for Model 5.
Are the accuracies of my models on an individual level considered to be good? The models are predicting likelihood of developing heart disease. Accuracies below:

Sorry for the long post and thanks for all your input and suggestions. I'm new to ML so I'd appreciate any pointers.

Answer 1

This is a simple implementation, that hopefully solves your main problem of chaining multiple estimators:

class ChainEstimator(BaseEstimator,ClassifierMixin):
    def __init__(self,est1,est2):
        self.est1 = est1
        self.est2 = est2

    def fit(self,X,y):
        self.est1.fit(X,y)
        self.est2.fit(X,y)
        return self

    def predict(self,X):
        ans = np.zeros((len(X),)) - 1
        probs = self.est1.predict_proba(X)       #averaging confidence of Ada & SVC
        conf_samples = np.any(probs>=.8,axis=1)  #samples with >80% confidence
        ans[conf_samples] = np.argmax(probs[conf_samples,:],axis=1) #Predicted Classes of confident samples
        if conf_samples.sum()<len(X):            #Use est2 for non-confident samples
            ans[~conf_samples] = self.est2.predict(X[~conf_samples])
        return ans

Which you can call like this:

est1 = VotingClassifier(estimators=[('ada',AdaBoostClassifier()),('svm',SVC(probability=True))],voting='soft')
est2 = VotingClassifier(estimators=[('dt',DecisionTreeClassifier()),('knn',KNeighborsClassifier())])
clf = ChainEstimator(est1,est2).fit(X_train,Y_train)
ans = clf.predict(X_test)

Now if you want to base your chaining on the performance of est1 , you can do something like this to record its performance during training, and add a few more if s on the predict function:

def fit(self,X,y):
    self.est1.fit(X,y)
    self.est1_perf = cross_val_score(self.est1,X,y,cv=4,scoring='f1_macro')
    self.est2.fit(X,y)
    self.est2_perf = cross_val_score(self.est2,X,y,cv=4,scoring='f1_macro')
    return self

Note that you shouldn't be using simple accuracy for problem like this.

How to chain ML models/pipeline models sequentially?

Question

1 answers

solution1
2 ACCPTED 2019-11-03 07:26:59

How to chain ML models/pipeline models sequentially?

Question

1 answers

solution1 2 ACCPTED 2019-11-03 07:26:59

solution1
2 ACCPTED 2019-11-03 07:26:59