简体   繁体   English

为机器学习选择最佳特征

[英]Selecting Best features for ML

Is there any way to extract best features from the data.有什么方法可以从数据中提取最佳特征。 Right now, I am using 'KBest' from sklearn.现在,我正在使用 sklearn 的“KBest”。 In this, I have to specify number of K best features that needs to be selected.在这里,我必须指定需要选择的 K 个最佳特征的数量。 Is there any way in which I don't have to specify the number of features to be extracted?有什么方法可以让我不必指定要提取的特征数量? Rather we extract all the useful features?而是我们提取所有有用的特征?

from sklearn.feature_selection import SelectKBest
test = SelectKBest(score_func=chi2, k=4)

You can use "all" instead of a number您可以使用"all"而不是数字

test = SelectKBest(score_func=chi2, k="all")

From docs来自文档

k : int or “all”, optional, default=10 k : int 或“all”,可选,默认=10

Number of top features to select.要选择的主要特征的数量。 The “all” option bypasses selection, for use in a parameter search. “all”选项绕过选择,用于参数搜索。

Many ways to select features.多种选择特征的方法。 In wiki , you can find them.And I think the best feature selection method is that you have a deep understanding of these features.But usually we have a hard time understanding them.wiki上,你可以找到它们。我认为最好的特征选择方法是你对这些特征有深刻的理解。但通常我们很难理解它们。

Maybe you can use 5-Kfold cross-validation to make a feature importance ranking, and them select important feature from it.也许您可以使用 5-Kfold 交叉验证来进行特征重要性排名,然后他们从中选择重要特征。

And you also can use Embedded method to select it, like this:您也可以使用Embedded方法来选择它,如下所示:

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import GradientBoostingClassifier

#Feature selection of GBDT as base model
SelectFromModel(GradientBoostingClassifier()).fit_transform(iris.data, iris.target)

It's worth noting that you cannot delete a feature that seems to be useless alone,because it may be related to other features.So feature selection is a greedy search process, which is often time consuming.值得注意的是,不能单独删除一个看似无用的特征,因为它可能与其他特征相关。所以特征选择是一个贪婪的搜索过程,通常很耗时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM