[英]How can I store and print the top 20% feature names and scores?
The following code reads in cleaned-up titanic data, prints out all the features and scores 以下代码读取清理的泰坦尼克数据,打印出所有功能和分数
import csv
import numpy as np
data = np.genfromtxt('titanic.csv',dtype=float, delimiter=',', names=True)
feature_names = np.array(data.dtype.names)
feature_names = feature_names[[ 0,1,2,3,4]]
data = np.genfromtxt('plants.csv',dtype=float, delimiter=',', skip_header=1)
_X = data[:, [0,1,2,3,4]]
#Return a flattened array required by scikit-learn fit for 2nd argument
_y = np.ravel(data[:,[5]])
from sklearn import feature_selection
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20)
X_train_fs = fs.fit_transform(_X, _y)
print feature_names, '\n', fs.scores_
Result: 结果:
['A' 'B' 'C' 'D' 'E']
[ 4.7324711 89.1428574 70.23474577 7.02447375 52.42447817]
What I want to do is to capture the top 20% of features, and store the names and scores in an array I can then sort by scores. 我想要做的是捕获前20%的功能,并将名称和分数存储在一个数组中,然后我可以按分数排序。 This will assist me in larger features set dimension reduction.
这将有助于我在更大的功能集减少尺寸。 Why am I getting all 5 features, how can I fix that, and how can I store and print the top 20% feature names and scores?
为什么我会获得所有5个功能,如何解决这个问题,以及如何存储和打印前20%的功能名称和分数?
You are almost there. 你快到了。 The scores are indeed stored in
fs.scores_
; 分数确实存储在
fs.scores_
; however, the eventually selected features (according to the percentile you've set) are stored in X_train_fs
. 但是,最终选择的特征(根据您设置的百分位数)存储在
X_train_fs
。 Try to print the shape of X_train_fs
and it should have a column number smaller than 5. 尝试打印
X_train_fs
的形状,它的列号应小于5。
The code below may help you in the sorting part: 下面的代码可以帮助您在排序部分:
import numpy as np
from sklearn import feature_selection
_X = np.random.random((100,5))
_y = np.random.random(100)
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20)
X_train_fs = fs.fit_transform(_X, _y)
feature_names = ['a','b','c','d','e']
print 'All features:', feature_names
print 'Scores of these features:', fs.scores_
print '***Features sorted by score:', [feature_names[i] for i in np.argsort(fs.scores_)[::-1]]
print 'Peeking into first few samples (before and after):'
print _X[:10]
print X_train_fs[:10]
Output: 输出:
All features: ['a', 'b', 'c', 'd', 'e']
Scores of these features: [ 17.08834764 13.97983442 18.0124008 17.79594679 14.77178022]
***Features sorted by score: ['c', 'd', 'a', 'e', 'b']
Peeking into first few samples (before and after):
[[ 0.34808143 0.79142591 0.75333429 0.69246515 0.29079619]
[ 0.81726059 0.93065583 0.01183974 0.66227077 0.82216764]
[ 0.8791751 0.21764549 0.06147596 0.01156631 0.22077268]
[ 0.91079625 0.58496956 0.68548851 0.55365907 0.78447282]
[ 0.24489774 0.88725231 0.32411121 0.09189075 0.83266337]
[ 0.1041106 0.98683633 0.22545763 0.98577525 0.41408367]
[ 0.09014649 0.51216454 0.62158409 0.94874742 0.81915236]
[ 0.32828772 0.05461745 0.43343171 0.59472169 0.83159784]
[ 0.33792151 0.47963184 0.08690499 0.31566743 0.26170533]
[ 0.10012106 0.36240434 0.86687847 0.64894175 0.51167487]]
[[ 0.75333429]
[ 0.01183974]
[ 0.06147596]
[ 0.68548851]
[ 0.32411121]
[ 0.22545763]
[ 0.62158409]
[ 0.43343171]
[ 0.08690499]
[ 0.86687847]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.