简体   繁体   English

sklearn.ensemble中的python特征选择特征重要性方法在多次运行中给出不一致的结果

[英]python feature selection feature importance method from sklearn.ensemble gives inconsistent results in multiple runs

I am trying to do feature selection using python feature imporance from sklearn.ensemble. 我正在尝试使用来自sklearn.ensemble的python功能缺陷进行功能选择。 The problem is every time I run the code (below), the results varies. 问题是每次我运行代码(如下)时,结果都会不同。 I mean it gives me different columns as the largest feature importance values. 我的意思是,它给了我不同的列,作为最大的功能重要性值。 Isn't it strange? 奇怪吗? or am I doing something wrong (?) 还是我做错了事(?)

I have too many features (about 500 ... & 50k records). 我功能太多(大约500 ...&50k记录)。 I would like to get the more important features to improve the classification. 我想获得更重要的功能来改善分类。 But the results of feature importance doesn't seem consistant. 但是功能重要性的结果似乎不一致。

#Feature importance 
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt

#X independednt cols and y the target col
model = ExtraTreesClassifier()
model.fit(X,y)

# print(model.feature_importances_)

feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind ="barh")

Randomness enters the fitting, so you should not expect to end up with the exact same results. 随机性进入拟合,因此您不应期望最终得到完全相同的结果。 To get reproducible results, you can provide the seed parameter to your estimator. 为了获得可重现的结果,可以将seed参数提供给估算器。

If for different seeds you end up with hugely different variable importances, this means that none of the features seems to dominate the predictive content of your data, as far as trees can capture it. 如果对于不同的种子,您最终获得的变量重要性差异很大,则意味着就树木可以捕获的数据而言,这些功能似乎都不是支配数据的预测内容。 So variable importances should be considered with a grain of salt. 因此,应考虑不同的重要性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从功能集中选择集合功能 - Ensemble feature selection from feature sets 无法从 sklearn.ensemble 导入任何内容 - Cannot import anything from sklearn.ensemble python warnings.filterwarnings 不会忽略“import sklearn.ensemble”中的 DeprecationWarning - python warnings.filterwarnings does not ignore DeprecationWarning from 'import sklearn.ensemble' 从Sklearn管道中提取具有特征名称的特征重要性 - Extracting Feature Importance with Feature Names from a Sklearn Pipeline 如何解释sklearn.feature_selection中多类的_coeffs输出的特征重要性? - How to interpret importance of features from _coeffs outputs for multi-class in sklearn.feature_selection? 检查来自 sklearn.ensemble 的模型是否已适合数据 - Check if model from sklearn.ensemble has been fitted to data sklearn.ensemble ImportError 中的 VotingClassifier - VotingClassifier in sklearn.ensemble ImportError 从 sklearn.ensemble 导入时出现导入错误? - IMPORT ERROR when importing from sklearn.ensemble? 如何解释集成方法的特征重要性? - How to interpret feature importance for ensemble methods? Python内核岭回归:如何知道特征对于特征选择的重要性? - Python kernel ridge regression: how to know the feature importance for feature selection?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM