简体   繁体   English

通过样本获取特征重要性 - Python Scikit Learn

[英]Getting feature importance by sample - Python Scikit Learn

I have a fitted model ( clf ) using sklearn.ensemble.RandomForestClassifier .我有一个使用sklearn.ensemble.RandomForestClassifier的拟合模型( clf )。 I already know that I can get the feature importances with clf.feature_importances_ .我已经知道我可以使用clf.feature_importances_获得特征重要性。 Anyway, I would like to know, if it's possible, how to get feature importances but by each sample.无论如何,我想知道,如果可能的话,如何通过每个样本获取特征重要性。

Example:例子:

from sklearn.ensemble import RandomForestClassifier

X = {"f1":[0,1,1,0,1], "f2":[1,1,1,0,1]}
y = [0,1,0,1,0]

clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)

y_pred = clf.predict(X)

Then, how do I get something like this:然后,我如何得到这样的东西:

y_pred f1_importance f2_importance
   1         0.57          0.43          
   1         0.26          0.74
   1         0.31          0.69
   0         0.62          0.38
   1         0.16          0.84

* y_pred values aren't real. * y_pred值不是真实的。 I'm actually using pandas for the real project in Python 3.8.我实际上是在 Python 3.8 中将pandas用于实际项目。

You can use the treeinterpreter to get the feature importance for individual predictions of your RandomForestClassifier您可以使用treeinterpreter得到您的个人预测的功能重要性RandomForestClassifier

You can find the treeinterpreter on Github and install it via您可以在Github上找到treeinterpreter并通过以下方式安装它

pip install treeinterpreter

I used your reference code but had to adjust it, because you can not use a dictionary as input to fit your RandomForestClassifier :我使用了您的参考代码,但不得不对其进行调整,因为您不能使用字典作为输入来适应您的RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import numpy as np
import pandas as pd

X = np.array([[0,1],[1,1],[1,1],[0,0],[1,1]])
y = [0,1,0,1,0]

clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, y)

y_pred = clf.predict(X)
y_pred_probas = clf.predict_proba(X)

Then I used the treeinterpreter with your classifier and data to compute the bias, contributions and also the prediction values:然后我使用带有分类器和数据的 treeinterpreter 来计算偏差、贡献以及预测值:

prediction, bias, contributions = ti.predict(clf, X)

df = pd.DataFrame(data=np.matrix([y_pred, prediction.transpose()[0], prediction.transpose()[1], np.sum(contributions, axis=1).transpose()[0], bias.transpose()[0], np.sum(contributions, axis=1).transpose()[1], bias.transpose()[1]]).transpose(), columns=["Prediction", "Prediction value 0", "Prediction value 1", "f1_contribution", "f1_bias", "f2_contribution","f2_bias"])

df

Output输出

在此处输入图片说明

You can have a look at this blogpost by the author, to understand better how it works.您可以查看作者的这篇博文,以更好地了解它的工作原理。

In the table the prediction value for 0 and 1 refers to the probability for both classes, which you can also compute by using the existing predict_proba() method of RandomForestClassifier .在表中,0 和 1 的预测值是指两个类的概率,您也可以使用RandomForestClassifier的现有predict_proba()方法RandomForestClassifier

You can verify, that the bias and contributions add up to the prediction value/probability like this:您可以验证,偏差和贡献加起来像这样的预测值/概率:

bias + np.sum(contributions, axis=1)

Output输出

array([[0.744 , 0.256 ],
       [0.6565, 0.3435],
       [0.6565, 0.3435],
       [0.214 , 0.786 ],
       [0.6565, 0.3435]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 scikit learn - 在决策树中进行特征重要性计算 - scikit learn - feature importance calculation in decision trees scikit-学习逻辑回归特征重要性 - scikit-learn logistic regression feature importance 检查 scikit-learn 管道中的特征重要性 - Inspection of the feature importance in scikit-learn pipelines scikit-learn中R随机森林特征重要性得分的实现 - implementation of R random forest feature importance score in scikit-learn 无需改装即可在SciKit-Learn GradientBoostingClassifier中删除零重要性功能 - Zero Importance Feature Removal Without Refit in SciKit-Learn GradientBoostingClassifier 决策树的特征重要性提取(scikit-learn) - Feature Importance extraction of Decision Trees (scikit-learn) SciKit中决策树中的Feature_importance向量与特征名称一起学习 - Feature_importance vector in Decision Trees in SciKit Learn along with feature names scikit功能重要性选择经验 - scikit feature importance selection experiences Scikit学习SelectFromModel-实际获取基础预测变量的特征重要性得分 - Scikit-learn SelectFromModel - actually obtain the feature importance scores of underlying predictor scikit-learn 如何使用管道查看特征重要性以及如何进行逻辑 + 岭回归 - scikit-learn how to see feature importance using pipeline and how to do a logistic + ridge regression
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM