简体   繁体   English

在 Mlens Pipeline 中使用 StandardScaler 作为预处理器会生成分类警告

[英]Using StandardScaler as Preprocessor in Mlens Pipeline generates Classification Warning

I am trying to scale my data within the crossvalidation folds of a MLENs Superlearner pipeline.我正在尝试在 MLENs Superlearner 管道的交叉验证折叠中扩展我的数据。 When I use StandardScaler in the pipeline (as demonstrated below), I receive the following warning:当我在管道中使用 StandardScaler 时(如下所示),我收到以下警告:

/miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] Could not score pipeline-1.mlpclassifier. /miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226:MetricWarning:[pipeline-1.mlpclassifier.0.2] 无法评分 pipeline-1.mlpclassifier。 Details: ValueError("Classification metrics can't handle a mix of binary and continuous-multioutput targets") (name, inst_name, exc), MetricWarning)详细信息:ValueError(“分类指标无法处理二进制和连续多输出目标的混合”)(名称,inst_name,exc),MetricWarning)

Of note, when I omit the StandardScaler() the warning disappears, but the data is not scaled.值得注意的是,当我省略 StandardScaler() 时,警告会消失,但数据不会被缩放。

breast_cancer_data = load_breast_cancer()

X = breast_cancer_data['data']
y = breast_cancer_data['target']

from sklearn.model_selection import train_test_split
X, X_val, y, y_val = train_test_split(X, y, test_size=.3, random_state=0)

from sklearn.base import BaseEstimator
class RFBasedFeatureSelector(BaseEstimator):
  
    def __init__(self, n_estimators):
        self.n_estimators = n_estimators
        self.selector = None

    def fit(self, X, y):
        clf = RandomForestClassifier(n_estimators=self.n_estimators, random_state = RANDOM_STATE, class_weight = 'balanced')
        clf = clf.fit(X, y)
        self.selector = SelectFromModel(clf, prefit=True, threshold = 0.001)

    def transform(self, X):
        if self.selector is None:
            raise AttributeError('The selector attribute has not been assigned. You cannot call transform before first calling fit or fit_transform.')
        return self.selector.transform(X)

    def fit_transform(self, X, y):
        self.fit(X, y)
        return self.transform(X)

N_FOLDS = 5
RF_ESTIMATORS = 1000
N_ESTIMATORS = 1000
RANDOM_STATE = 42

from mlens.metrics import make_scorer
from sklearn.metrics import roc_auc_score, balanced_accuracy_score
accuracy_scorer = make_scorer(balanced_accuracy_score, average='micro', greater_is_better=True)

from mlens.ensemble.super_learner import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectFromModel


ensemble = SuperLearner(folds=N_FOLDS, shuffle=True, random_state=RANDOM_STATE, n_jobs=10, scorer=balanced_accuracy_score, backend="multiprocessing")

preprocessing1 = {'pipeline-1': [StandardScaler()]
                 }

preprocessing2 = {'pipeline-1': [RFBasedFeatureSelector(N_ESTIMATORS)]
                 }

estimators = {'pipeline-1': [RandomForestClassifier(RF_ESTIMATORS, random_state=RANDOM_STATE, class_weight='balanced'), 
                             MLPClassifier(hidden_layer_sizes=(10, 10, 10), activation='relu', solver='sgd',
                                           max_iter=5000)
                                         ]
                 }

ensemble.add(estimators, preprocessing2, preprocessing1)

ensemble.add_meta(LogisticRegression(solver='liblinear', class_weight = 'balanced'))

ensemble.fit(X,y)

yhat = ensemble.predict(X_val)
balanced_accuracy_score(y_val, yhat)```

>Error text: /miniconda3/envs/r_env/lib/python3.7/site-packages/mlens/parallel/_base_functions.py:226: MetricWarning: [pipeline-1.mlpclassifier.0.2] Could not score pipeline-1.mlpclassifier. Details:
ValueError("Classification metrics can't handle a mix of binary and continuous-multioutput targets")
  (name, inst_name, exc), MetricWarning)

You are currently passing your preprocessing steps as two separate arguments when calling the add method.您当前在调用 add 方法时将预处理步骤作为两个单独的 arguments 传递。 You can instead combine them as follows:您可以按如下方式组合它们:

preprocessing = {'pipeline-1': [RFBasedFeatureSelector(N_ESTIMATORS),StandardScaler()]}

Please refer to the documentation for the add method found here: https://mlens.readthedocs.io/en/0.1.x/source/mlens.ensemble.super_learner/请参阅此处找到的添加方法的文档: https://mlens.readthedocs.io/en/0.1.x/source/mlens.ensemble.super_learner/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Keras回归使用Scikit了解StandardScaler与管道和无管道 - Keras Regression using Scikit Learn StandardScaler with Pipeline and without Pipeline 在流水线中的特定列上使用 StandardScaler 并连接到原始数据 - Using StandardScaler on specific column in Pipeline and concatenate to original data 在Pandas数据帧列的子集上使用Pipeline中的scikit StandardScaler - Using scikit StandardScaler in Pipeline on a subset of Pandas dataframe columns 使用 sklearn 管道和 GridSearch 时如何在目标变量上应用 StandardScaler? - How to apply StandardScaler on objective variable when using sklearn pipeline and GridSearch? Sklearn - 具有 StandardScaler、PolynomialFeatures 和回归的管道 - Sklearn - Pipeline with StandardScaler, PolynomialFeatures and Regression scikit是否了解Pipeline将StandardScaler应用于y? - Does scikit learn Pipeline apply StandardScaler to y? 在3D数据上使用Standardscaler - Using Standardscaler on 3D data 如何使用 tf.data 管道训练图像分类模型? - How to train image classification model using tf.data pipeline? 是否可以使用 sklearn 中的管道来平均多个分类模型的输出? - Is it possible to average the output of multiple classification models using pipeline in sklearn? 如何仅在某些值上在管道内使用 StandardScaler? - How to use StandardScaler inside a pipeline only on certain values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM