[英]Selecting Best features for ML
Is there any way to extract best features from the data.有什么方法可以从数据中提取最佳特征。 Right now, I am using 'KBest' from sklearn.
现在,我正在使用 sklearn 的“KBest”。 In this, I have to specify number of K best features that needs to be selected.
在这里,我必须指定需要选择的 K 个最佳特征的数量。 Is there any way in which I don't have to specify the number of features to be extracted?
有什么方法可以让我不必指定要提取的特征数量? Rather we extract all the useful features?
而是我们提取所有有用的特征?
from sklearn.feature_selection import SelectKBest
test = SelectKBest(score_func=chi2, k=4)
You can use "all"
instead of a number您可以使用
"all"
而不是数字
test = SelectKBest(score_func=chi2, k="all")
k : int or “all”, optional, default=10
k : int 或“all”,可选,默认=10
Number of top features to select.
要选择的主要特征的数量。 The “all” option bypasses selection, for use in a parameter search.
“all”选项绕过选择,用于参数搜索。
Many ways to select features.多种选择特征的方法。 In wiki , you can find them.And I think the best feature selection method is that you have a deep understanding of these features.But usually we have a hard time understanding them.
在wiki上,你可以找到它们。我认为最好的特征选择方法是你对这些特征有深刻的理解。但通常我们很难理解它们。
Maybe you can use 5-Kfold cross-validation to make a feature importance ranking, and them select important feature from it.也许您可以使用 5-Kfold 交叉验证来进行特征重要性排名,然后他们从中选择重要特征。
And you also can use Embedded
method to select it, like this:您也可以使用
Embedded
方法来选择它,如下所示:
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import GradientBoostingClassifier
#Feature selection of GBDT as base model
SelectFromModel(GradientBoostingClassifier()).fit_transform(iris.data, iris.target)
It's worth noting that you cannot delete a feature that seems to be useless alone,because it may be related to other features.So feature selection is a greedy search process, which is often time consuming.值得注意的是,不能单独删除一个看似无用的特征,因为它可能与其他特征相关。所以特征选择是一个贪婪的搜索过程,通常很耗时。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.