使用f1-score与多个参数进行交叉验证

Question

I am trying to do feature selection using SelectKBest and the best tree depth for binary classification using f1-score. 我正在尝试使用SelectKBest进行特征选择，并使用f1-score进行二进制分类的最佳树深度。 I have created a scorer function to select the best features and to evaluate the grid search. 我创建了一个计分功能，以选择最佳功能并评估网格搜索。 An error of " call () missing 1 required positional argument: 'y_true'" pops up when the classifier is trying to fit to the training data. 当分类器尝试适合训练数据时，会弹出错误“ call （）缺少1个必需的位置参数：'y_true'”。

#Define scorer
f1_scorer = make_scorer(f1_score)
#Split data into training, CV and test set
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state = 0)

#initialize tree and Select K-best features for classifier   
kbest = SelectKBest(score_func=f1_scorer, k=all)
clf = DecisionTreeClassifier(random_state=0)

#create a pipeline for features to be optimized
pipeline = Pipeline([('kbest',kbest),('dt',clf)])

#initialize a grid search with features to be optimized
gs = GridSearchCV(pipeline,{'kbest__k': range(2,11), 'dt__max_depth':range(3,7)}, refit=True, cv=5, scoring = f1_scorer)

gs.fit(X_train,y_train)

#order best selected features into a single variable
selector = SelectKBest(score_func=f1_scorer, k=gs.best_params_['kbest__k'])
X_new = selector.fit_transform(X_train,y_train)  

On the fit line I get a TypeError: __call__() missing 1 required positional argument: 'y_true'.

Answer 1

The problem is in the score_func which you have used for SelectKBest . 问题是在score_func你已经使用了SelectKBest 。 score_func is a function which takes two arrays X and y , and returning a pair of arrays (scores, pvalues) or a single array with scores, but in your code you have fed the callable f1_scorer as the score_func which just takes your y_true and y_pred and computes the f1 score . score_func是一个函数，它接受两个数组X和y ，并返回一对数组（分数，pvalues）或带有分数的单个数组，但是在您的代码中，您已经将可调用的f1_scorer作为score_func喂入了f1_scorer ， score_func接受了y_true和y_pred并计算f1 score 。 You can use one of chi2 , f_classif or mutual_info_classif as your score_func for the classification task. 您可以使用一个chi2 f_classif或mutual_info_classif为您score_func的分类任务。 Also, there is a minor bug in the parameter k for SelectKBest it should have been "all" instead of all . 另外， SelectKBest的参数k存在一个小错误，应该是"all"而不是all 。 I have modified your code incorporating these changes, 我已经修改了您的代码，并纳入了这些更改，

from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.feature_selection import f_classif  
from sklearn.metrics import f1_score, make_scorer
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_classes=2,
                       n_informative=4, weights=[0.7, 0.3],
                       random_state=0)

f1_scorer = make_scorer(f1_score)
#Split data into training, CV and test set
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state = 0)

#initialize tree and Select K-best features for classifier   
kbest = SelectKBest(score_func=f_classif)
clf = DecisionTreeClassifier(random_state=0)

#create a pipeline for features to be optimized
pipeline = Pipeline([('kbest',kbest),('dt',clf)])
gs = GridSearchCV(pipeline,{'kbest__k': range(2,11), 'dt__max_depth':range(3,7)}, refit=True, cv=5, scoring = f1_scorer)
gs.fit(X_train,y_train)
gs.best_params_

OUTPUT 输出值

{'dt__max_depth': 6, 'kbest__k': 9} {'dt__max_depth'：6，'kbest__k'：9}

Also modify your last two lines as below: 还可以如下修改最后两行：

selector = SelectKBest(score_func=f_classif, k=gs.best_params_['kbest__k'])
X_new = selector.fit_transform(X_train,y_train)

Hope this helps! 希望这可以帮助！

使用f1-score与多个参数进行交叉验证

问题描述

1 个解决方案

解决方案1
0 2019-07-03 18:35:18

使用f1-score与多个参数进行交叉验证

问题描述

1 个解决方案

解决方案1 0 2019-07-03 18:35:18

解决方案1
0 2019-07-03 18:35:18