简体   繁体   English

如何存储和打印前20%的功能名称和分数?

[英]How can I store and print the top 20% feature names and scores?

The following code reads in cleaned-up titanic data, prints out all the features and scores 以下代码读取清理的泰坦尼克数据,打印出所有功能和分数

import csv 
import numpy as np

data = np.genfromtxt('titanic.csv',dtype=float, delimiter=',', names=True)

feature_names = np.array(data.dtype.names)
feature_names = feature_names[[ 0,1,2,3,4]] 

data = np.genfromtxt('plants.csv',dtype=float, delimiter=',', skip_header=1)

_X = data[:, [0,1,2,3,4]] 
#Return a flattened array required by scikit-learn fit for 2nd argument
_y = np.ravel(data[:,[5]])

from sklearn import feature_selection
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20)
X_train_fs = fs.fit_transform(_X, _y)

print feature_names, '\n', fs.scores_

Result: 结果:

['A'  'B' 'C' 'D' 'E']
[  4.7324711   89.1428574   70.23474577   7.02447375  52.42447817]

What I want to do is to capture the top 20% of features, and store the names and scores in an array I can then sort by scores. 我想要做的是捕获前20%的功能,并将名称和分数存储在一个数组中,然后我可以按分数排序。 This will assist me in larger features set dimension reduction. 这将有助于我在更大的功能集减少尺寸。 Why am I getting all 5 features, how can I fix that, and how can I store and print the top 20% feature names and scores? 为什么我会获得所有5个功能,如何解决这个问题,以及如何存储和打印前20%的功能名称和分数?

You are almost there. 你快到了。 The scores are indeed stored in fs.scores_ ; 分数确实存储在fs.scores_ ; however, the eventually selected features (according to the percentile you've set) are stored in X_train_fs . 但是,最终选择的特征(根据您设置的百分位数)存储在X_train_fs Try to print the shape of X_train_fs and it should have a column number smaller than 5. 尝试打印X_train_fs的形状,它的列号应小于5。

The code below may help you in the sorting part: 下面的代码可以帮助您在排序部分:

import numpy as np
from sklearn import feature_selection

_X = np.random.random((100,5))
_y = np.random.random(100)
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20)
X_train_fs = fs.fit_transform(_X, _y)
feature_names = ['a','b','c','d','e']

print 'All features:', feature_names
print 'Scores of these features:', fs.scores_
print '***Features sorted by score:', [feature_names[i] for i in np.argsort(fs.scores_)[::-1]]
print 'Peeking into first few samples (before and after):'
print _X[:10]
print X_train_fs[:10]

Output: 输出:

All features: ['a', 'b', 'c', 'd', 'e']
Scores of these features: [ 17.08834764  13.97983442  18.0124008   17.79594679  14.77178022]
***Features sorted by score: ['c', 'd', 'a', 'e', 'b']
Peeking into first few samples (before and after):
[[ 0.34808143  0.79142591  0.75333429  0.69246515  0.29079619]
 [ 0.81726059  0.93065583  0.01183974  0.66227077  0.82216764]
 [ 0.8791751   0.21764549  0.06147596  0.01156631  0.22077268]
 [ 0.91079625  0.58496956  0.68548851  0.55365907  0.78447282]
 [ 0.24489774  0.88725231  0.32411121  0.09189075  0.83266337]
 [ 0.1041106   0.98683633  0.22545763  0.98577525  0.41408367]
 [ 0.09014649  0.51216454  0.62158409  0.94874742  0.81915236]
 [ 0.32828772  0.05461745  0.43343171  0.59472169  0.83159784]
 [ 0.33792151  0.47963184  0.08690499  0.31566743  0.26170533]
 [ 0.10012106  0.36240434  0.86687847  0.64894175  0.51167487]]
[[ 0.75333429]
 [ 0.01183974]
 [ 0.06147596]
 [ 0.68548851]
 [ 0.32411121]
 [ 0.22545763]
 [ 0.62158409]
 [ 0.43343171]
 [ 0.08690499]
 [ 0.86687847]]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从字典中打印分数大于 50 的学生的姓名? - How can I print the names of students with scores larger than 50 from a dictionary? 如何对获胜者分数的文本文件进行排序以打印前 5 名获胜者? - How can I sort a text file of winners' scores in order to print the top 5 winners? 如何在此列表中仅打印大写名称并将小写名称存储在另一个列表中? - how can I print only capitalized names in this list and store the lowercase names in another list? 打印文件中的前 3 个分数和后 3 个分数 - Print top 3 scores and bottom 3 scores in a file Python 3.6.5 如何显示带有名称的文件中的前 5 名分数? - Python 3.6.5 How to display the top5 scores from a file with names? Python:如何迭代 20 列并找到顶列? - Python: How can I iterate 20 columns and find the top column? 如何在scikit中获取与卡方特征选择分数相对应的特征名称 - How to get feature names corresponding to scores for chi square feature selection in scikit 我有一个json格式的数据文件。 如何找到并打印前 20 个 eij_max 值和相关的 Pretty_formula? 我正在使用蟒蛇 - I have a data file in json format. How can I find and print the top 20 eij_max values and the associated pretty_formula? I am using python 如何从sklearn TruncatedSVD对象获取功能名称? - How can I get the feature names from sklearn TruncatedSVD object? 如何存储 output 打印数据? - How can I store an output print data?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM