从K折CV查找Logistic回归权重

Question

我有一个包含36个特征的数据集，并且正在将所有这些特征用于Fold交叉验证中的逻辑回归算法中。 我的K值为10。在CV的10折结束时，有什么方法可以找到我的全部36个特征的权重吗？ 这是我的代码：

    labels = df.columns[2:36]

    X = df[labels]
    y = df['target']

    # use train/test split with different random_state values
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)

    logreg = LogisticRegression()
    classifier_pre = cross_val_score(logreg, X, y, cv=20, scoring='precision')
    print("Precision:" ,classifier_pre.mean())

Answer 1

首先，python中的索引从0开始，因此编写labels = df.columns[2:36]假定您的目标列具有索引1，这在人类语言中是有意义的，它是从左侧开始的第二个索引（循环到值，第36列将作为第0列返回。 如果您的目标列是从数据labels = df.columns[1:35]左侧开始的第一列，则您应该写labels = df.columns[1:35]

某些功能（包括逻辑回归）已经在sklearn.linear_model中实现了CV模式。 我建议您在这里看看如何调整和使用它。

您可以尝试类似：

from sklearn.linear_model import LogisticRegressionCV

labels = df.columns[1:35] #if indeed your very first column is your target !!

logistic = LogisticRegressionCV(Cs=4, fit_intercept=True, cv=10, verbose =1, random_state=42)
logistic.fit(X, y)
print(logistic.coef_) #weights of each feature
print(logistic.intercept_) #value of intercept

最后一个建议：使用train_test_split生成的测试集是一个好主意，但是不要在上面训练模型。 最后仅将其用于评估。 意味着在这里，您应该使算法适合X_train和y_train并在X_test和y_test上X_test进行评估，而不是复制我编写的一小段代码，其中拟合部分是在X和y ，这将导致对如果在X和y上评估模型，则您的准确性...

Answer 2

我知道了。 我们可以这样实现：

labels = df.columns[2:35]

X = df[labels]
y = df['target']

kf = KFold(n_splits=10, shuffle=True, random_state=42)
logistic = LogisticRegressionCV(Cs=2, fit_intercept=True, cv=kf, verbose =1, random_state=42)
logistic.fit(X_train, y_train)
print("Train Coefficient:" , logistic.coef_) #weights of each feature
print("Train Intercept:" , logistic.intercept_) #value of intercept

对于KFOLD和LR中CV = 10的给定模型，这将给出系数和截距。

从K折CV查找Logistic回归权重

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-09-13 15:38:03

解决方案2
0 2018-09-17 18:28:57

从K折CV查找Logistic回归权重

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-09-13 15:38:03

解决方案2 0 2018-09-17 18:28:57

解决方案1
4 已采纳 2018-09-13 15:38:03

解决方案2
0 2018-09-17 18:28:57