简体   繁体   English

如何为KNNClassifier()查找“特征重要性”或可变重要性图

[英]How to find 'feature importance' or variable importance graph for KNNClassifier()

I am working on a numerical dataset using KNN Classifier of sklearn package. 我正在使用sklearn包的KNN分类器处理数字数据集。

Once the prediction is complete, the top 4 important variables should be displayed in a bar graph. 预测完成后,应在条形图中显示前4个重要变量。

Here is the solution I have tried, but it throws an error that feature_importances is not an attribute of KNNClassifier: 这是我尝试过的解决方案,但是会引发一个错误,即feature_importances不是KNNClassifier的属性:

neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train, y_train)
y_pred = neigh.predict(X_test)

(pd.Series(neigh.feature_importances_, index=X_test.columns)
   .nlargest(4)
   .plot(kind='barh'))

Now to display the variable importance graph for decision tree: the argument passed to pd.series() is classifier.feature_importances_ 现在显示决策树的可变重要性图:传递给pd.series()的参数为classifier.feature_importances_

For SVM, Linear discriminant analysis the argument passed to pd.series() is classifier.coef_[0]. 对于SVM,线性判别分析传递给pd.series()的参数为classifier.coef_ [0]。

However, I am unable to find a suitable argument for KNN classifier. 但是,我找不到适合KNN分类器的参数。

Feature importance is not defined for the KNN Classification algorithm. 没有为KNN分类算法定义功能重要性。 There is no easy way to compute the features responsible for a classification here. 这里没有简单的方法来计算负责分类的要素。 What you could do is use a random forest classifier which does have the feature_importances_ attribute. 您可以使用随机森林分类器,该分类器具有feature_importances_属性。 Even in this case though, the feature_importances_ attribute tells you the most important features for the entire model, not specifically the sample you are predicting on. 即使在这种情况下, feature_importances_属性也会告诉您整个模型最重要的功能,而不是具体针对您要预测的样本。

If you are set on using KNN though, then the best way to estimate feature importance is by taking the sample to predict on, and computing its distance from each of its nearest neighbors for each feature (call these neighb_dist ). 但是,如果您打算使用KNN,则估计要素重要性的最佳方法是对样本进行预测,并针对每个要素计算距其最近邻居的距离(称为neighb_dist )。 Then do the same computations for a few random points (call these rand_dist ) instead of the nearest neighbors. 然后对几个随机点(称为这些rand_dist )而不是最近的邻居进行相同的计算。 Then for each feature, you take the ratio of neighb_dist / rand_dist , and the smaller the ratio, the more important that feature is. 然后,对于每个功能,您都采用neighb_dist / rand_dist的比率,比率越小,该功能就越重要。

Gere is a good, and generic, example. 基尔是一个很好的,通用的例子。

#importing libraries
from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
%matplotlib inline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE
from sklearn.linear_model import RidgeCV, LassoCV, Ridge, Lasso#Loading the dataset
x = load_boston()
df = pd.DataFrame(x.data, columns = x.feature_names)
df["MEDV"] = x.target
X = df.drop("MEDV",1)   #Feature Matrix
y = df["MEDV"]          #Target Variable
df.head()

reg = LassoCV()
reg.fit(X, y)
print("Best alpha using built-in LassoCV: %f" % reg.alpha_)
print("Best score using built-in LassoCV: %f" %reg.score(X,y))
coef = pd.Series(reg.coef_, index = X.columns)

print("Lasso picked " + str(sum(coef != 0)) + " variables and eliminated the other " +  str(sum(coef == 0)) + " variables")

imp_coef = coef.sort_values()
import matplotlib
matplotlib.rcParams['figure.figsize'] = (8.0, 10.0)
imp_coef.plot(kind = "barh")
plt.title("Feature importance using Lasso Model")

在此处输入图片说明

All details are listed below. 所有详细信息在下面列出。

https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b https://towardsdatascience.com/feature-selection-with-pandas-e3690ad8504b

Here are two more great examples of the same. 这是相同的两个更好的例子。

https://www.scikit-yb.org/en/latest/api/features/importances.html https://www.scikit-yb.org/en/latest/api/features/importances.html

https://github.com/WillKoehrsen/feature-selector/blob/master/Feature%20Selector%20Usage.ipynb https://github.com/WillKoehrsen/feature-selector/blob/master/Feature%20Selector%20Usage.ipynb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM