[英]How to get feature importance in Decision Tree?
I have a dataset of reviews which has a class label of positive/negative. 我有一个评论集,它的类别标签为正/负。 I am applying Decision Tree to that reviews dataset.
我正在将决策树应用于该评论数据集。 Firstly, I am converting into a Bag of words.
首先,我要转换成一个词袋。 Here sorted_data['Text'] is reviews and final_counts is a sparse matrix.
这里sorted_data ['Text']是评论,而final_counts是稀疏矩阵。
I am splitting the data into train and test dataset. 我将数据分为训练和测试数据集。
X_tr, X_test, y_tr, y_test = cross_validation.train_test_split(sorted_data['Text'], labels, test_size=0.3, random_state=0)
# BOW
count_vect = CountVectorizer()
count_vect.fit(X_tr.values)
final_counts = count_vect.transfrom(X_tr.values)
applying the Decision Tree algorithm as follows 如下应用决策树算法
# instantiate learning model k = optimal_k
# Applying the vectors of train data on the test data
optimal_lambda = 15
final_counts_x_test = count_vect.transform(X_test.values)
bow_reg_optimal = DecisionTreeClassifier(max_depth=optimal_lambda,random_state=0)
# fitting the model
bow_reg_optimal.fit(final_counts, y_tr)
# predict the response
pred = bow_reg_optimal.predict(final_counts_x_test)
# evaluate accuracy
acc = accuracy_score(y_test, pred) * 100
print('\nThe accuracy of the Decision Tree for depth = %f is %f%%' % (optimal_lambda, acc))
bow_reg_optimal is a decision tree classifier. bow_reg_optimal是决策树分类器。 Could anyone tell how to get the feature importance using the decision tree classifier?
谁能说出如何使用决策树分类器来获得特征重要性 ?
Use the feature_importances_
attribute, which will be defined once fit()
is called. 使用
feature_importances_
属性,该属性将在调用fit()
定义。 For example: 例如:
import numpy as np
X = np.random.rand(1000,2)
y = np.random.randint(0, 5, 1000)
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier().fit(X, y)
tree.feature_importances_
# array([ 0.51390759, 0.48609241])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.