简体   繁体   English

使用逻辑回归时如何打印一个简单的特征重要性列表?

[英]How to print a simple list of feature importance in when using Logistic Regression?

I am using the dataset found here: https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset我正在使用此处找到的数据集: https : //www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset

My code is:我的代码是:

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

log_reg_model = LogisticRegression(max_iter=1000, solver = "newton-cg")
log_reg_model = RFE(log_reg_model, 45) # using RFE to get the top 45 most important features
log_reg_model.fit(X_train_SMOTE, y_train_SMOTE) # fitting data
y_pred = log_reg_model.predict(X_test)
print("Model accruracy score: {}".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))

I am trying to print out the most most important features in order like when using the feature_importances_ function in Random Forest Classification.我试图按顺序打印出最重要的特征,就像在随机森林分类中使用 feature_importances_ 函数一样。

Is the above possible using LR?以上可以使用LR吗? I see similar questions on Stack Overflow but no answers that show the feature names and their importance.我在 Stack Overflow 上看到类似的问题,但没有显示功能名称及其重要性的答案。

To do this, you can use a method called shap , I definitely would recommend reading about SHAP before diving right into the code , as its going to be important for you and others to understand exactly what you are presenting.要做到这一点,你可以使用一个方法叫做shap我肯定会推荐阅读SHAP跳水对入代码之前,因为它的将是重要的,你和别人理解你提出什么。

However, an example of how that could work in your implementation is:但是,在您的实现中如何工作的一个例子是:

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
import shap

log_reg_model = LogisticRegression(max_iter=1000, solver = "newton-cg")
# log_reg_model = RFE(log_reg_model, 45) # using RFE to get the top 45 most important features
log_reg_model.fit(X_train_SMOTE, y_train_SMOTE) # fitting data
y_pred = log_reg_model.predict(X_test)
print("Model accruracy score: {}".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))

explainer = shap.LinearExplainer(log_reg_model, X_train_SMOTE)
shap_values = explainer.shap_values(X_test[:150])

shap.summary_plot(shap_values, feature_names = X_train_SMOTE.columns)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM