[英]Scikit-Learn: Logistic Regression OvR access binary estimators/classifier
As we might know, when using solver='liblinear'
on multiclass classification problem, logistic regression will use one-vs-rest strategy.我们可能知道,在多类分类问题上使用
solver='liblinear'
时,逻辑回归将使用 one-vs-rest 策略。 Does that mean there should be n_classes
number of binary classifiers/estimators?这是否意味着应该有
n_classes
个二元分类器/估计器? If so, how can I access those?如果是这样,我如何访问这些?
I have read the documentation , but could not find any way to do this.我已阅读文档,但找不到任何方法来执行此操作。
It looks like there is no easy way to access those sub models.看起来没有简单的方法来访问这些子模型。 However you can recompute these sub models using
model.coef_
and model.intercept_
.但是,您可以使用
model.coef_
和model.intercept_
重新计算这些子模型。
As follows:如下:
from sklearn.linear_model import LogisticRegression
from sklearn import svm, datasets
import numpy as np
X_train, y_train = datasets.load_iris(return_X_y=True)
model = LogisticRegression(
penalty="l1",
multi_class="ovr",
class_weight="balanced",
solver="liblinear",
)
model.fit(X_train, y_train)
n_labels = len(np.unique(y_train))
for i in range(n_labels):
sub_model = LogisticRegression(penalty=model.penalty, C=model.C)
sub_model.coef_ = model.coef_[i].reshape(1, -1)
sub_model.intercept_ = model.intercept_[i].reshape(-1, 1)
sub_model.classes_ = np.array([0, 1])
y_train_ovr = np.where(y_train == i, 1, 0)
score = sub_model.score(X_train, y_train_ovr)
print(f"OVR for label={i}, score={score:.4f}")
Output:输出:
OVR for label=0, score=1.0000
OVR for label=1, score=0.7333
OVR for label=2, score=0.9667
This code is basically creating a new LogisticRegression()
for each label based on the original model coefficients, intercepts, C and penality.这段代码基本上是根据原始模型系数、截距、C 和惩罚为每个标签创建一个新的
LogisticRegression()
。 Finally the y_train labels are encoded in order to represent this OVR
task.最后对 y_train 标签进行编码以表示此
OVR
任务。
The predictions of the LogisticRegression model are determined by the coef_
and intercept_
attributes estimated during fit. LogisticRegression 模型的预测由拟合期间估计的
coef_
和intercept_
属性决定。 Look at this example:看这个例子:
X, y = load_iris(return_X_y=True)
# ovr model
ovr = LogisticRegression(multi_class='ovr', penalty='none').fit(X, y)
# manually estimate binary model for each label
y_0 = np.array([label == 0 for label in y])
y_1 = np.array([label == 1 for label in y])
y_2 = np.array([label == 2 for label in y])
m0 = LogisticRegression(penalty='none').fit(X, y_0)
m1 = LogisticRegression(penalty='none').fit(X, y_1)
m2 = LogisticRegression(penalty='none').fit(X, y_2)
ovr.coef_
# array([[ 2.02162242, 6.99849918, -11.14813559, -5.15488554],
# [ -0.24535745, -2.79656276, 1.31365383, -2.77836927],
# [ -2.46523384, -6.6809256 , 9.42922775, 18.28660819]])
m0.coef_
# array([[ 2.02162242, 6.99849918, -11.14813559, -5.15488554]])
m1.coef_
# array([[-0.24535745, -2.79656276, 1.31365383, -2.77836927]])
m2.coef_
# array([[-2.46523384, -6.6809256 , 9.42922775, 18.28660819]])
The rows of the OVR model coefficient matrix are the coefficient vectors for the three binary problems. OVR 模型系数矩阵的行是三个二元问题的系数向量。 This is because during fitting of the OVR model, each of the three binary models is estimated.
这是因为在 OVR 模型的拟合过程中,三个二元模型中的每一个都被估计。 The OVR model then remembers the
coef_
vectors (and also the intercept_
s). OVR 模型然后记住
coef_
向量(以及intercept_
s)。
So to answer the question: Yes, there are n_classes
different binary estimators after fitting the OVR model.所以来回答这个问题:是的,拟合OVR模型后有
n_classes
不同的二元估计量。 Each is represented by it's intercept and coefficients (but not by separate scikit-learn estimator objects).每个都由它的截距和系数表示(但不是由单独的 scikit-learn 估计器对象表示)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.