[英]python feature selection feature importance method from sklearn.ensemble gives inconsistent results in multiple runs
I am trying to do feature selection using python feature imporance from sklearn.ensemble. 我正在尝试使用来自sklearn.ensemble的python功能缺陷进行功能选择。 The problem is every time I run the code (below), the results varies.
问题是每次我运行代码(如下)时,结果都会不同。 I mean it gives me different columns as the largest feature importance values.
我的意思是,它给了我不同的列,作为最大的功能重要性值。 Isn't it strange?
奇怪吗? or am I doing something wrong (?)
还是我做错了事(?)
I have too many features (about 500 ... & 50k records). 我功能太多(大约500 ...&50k记录)。 I would like to get the more important features to improve the classification.
我想获得更重要的功能来改善分类。 But the results of feature importance doesn't seem consistant.
但是功能重要性的结果似乎不一致。
#Feature importance
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
#X independednt cols and y the target col
model = ExtraTreesClassifier()
model.fit(X,y)
# print(model.feature_importances_)
feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind ="barh")
Randomness enters the fitting, so you should not expect to end up with the exact same results. 随机性进入拟合,因此您不应期望最终得到完全相同的结果。 To get reproducible results, you can provide the
seed
parameter to your estimator. 为了获得可重现的结果,可以将
seed
参数提供给估算器。
If for different seeds you end up with hugely different variable importances, this means that none of the features seems to dominate the predictive content of your data, as far as trees can capture it. 如果对于不同的种子,您最终获得的变量重要性差异很大,则意味着就树木可以捕获的数据而言,这些功能似乎都不是支配数据的预测内容。 So variable importances should be considered with a grain of salt.
因此,应考虑不同的重要性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.