简体   繁体   中英

Python Scikit - bad input shape when calling sklearn.metrics.precision_recall_curve

I'm trying to build a PRC (precision-recall curve) for a CatBoostClassifier .

But when I'm calling sklearn.metrics.precision_recall_curve(y_test, y_score) I'm getting ValueError: bad input shape (11912, 2) .

What could be wrong with my current approach? And what do I need to fix here to provide a correct shape?

import sklearn 
from sklearn import metrics 
y_score = model.predict_proba(X_test) 
prc_auc = sklearn.metrics.precision_recall_curve(y_test, y_score)

//Here is how I build a model

model = CatBoostClassifier( 
iterations=50, 
random_seed=63, 
learning_rate=0.15, 
custom_loss=['Accuracy', 'Precision', 'Recall', 'AUC']
) 

model.fit( 
X_train, y_train, 
cat_features=cat_features, 
eval_set=(X_test, y_test), 
verbose=10, 
plot=True 
);   

The trivial answer is that CatBoostClassifier.model.predict_proba returns a 2d array; sklearn.model.precision_recall_curve requires a 1d array (or a 2d array with one column, whichever).

The documentation for CatBoostClassifier says that predict_proba() returns numpy.array , and provides no other information about this method. So I hate the documentation for this package now.

Walking through some poorly-commented code gets me to:

    if prediction_type == 'Probability':
        predictions = np.transpose([1 - predictions, predictions])
        return predictions

I'm guessing that column 0 is the probability of class 0, and column 1 is the probability of class 1. So pick whichever of those things your test aligns with and use that column only.

prc_auc = sklearn.metrics.precision_recall_curve(y_test, y_score[:, 1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM