邏輯回歸 Model in Python 具有良好的准確度和精確度，但預測還差得很遠

Question

我建立了一個邏輯回歸 Model 來預測貸款接受者。 數據集是 94% 的非接受者和 6% 的接受者。 我已經運行了幾個邏輯回歸模型，一個使用原始數據集，一個在上采樣到 50/50 並刪除一些預測變量之后，一個沒有上采樣，但在刪除一些預測變量之后。

Model 1：在 25 個特征列上優於 90% 的准確率、精度和召回率。 After running the model, I output the predicting to a different CSV (same people as original csv though) and it's returning 10,000 acceptors. 我的猜測是這可能是由過度擬合引起的？ 不確定，但隨后在相同的 94% 非接受者和 6% 接受者上進行了嘗試，但變量更少（19 個特征列）。 這次准確率是 81%，但准確率只有 21%，而召回率是 765（用於訓練和測試）。 這次它只返回 8 個接受者（共 18,000 個）

最后，我嘗試了上采樣並上采樣到一個平衡的集合。 准確率只有 68%（我可以使用），訓練和測試的准確率和召回率為 66%。 運行 model 然后將預測輸出到 csv 文件（同樣的人，不同的 CSV 文件，不確定這是否會導致混亂）這次它返回了。

有人對導致此問題的原因以及如何解決此問題有任何建議嗎？

我不確定哪個回歸代碼最有益。 如果這會更有幫助，我很樂意發布上采樣代碼。

import statsmodels.api as sm

y=df.OpenedLCInd.values

X=df.drop('OpenedLCInd', axis = 1)

cols=X.columns

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

minmax= MinMaxScaler()
X=pd.DataFrame(minmax.fit_transform(X))
X.columns = cols

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score, roc_curve, auc, confusion_matrix

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .25, random_state= 33)

logreg=LogisticRegression(fit_intercept = False, C=1e12, solver ='liblinear', class_weight='balanced')

logreg.fit(X_train, y_train)

y_hat_train = logreg.predict(X_train)
y_hat_test = logreg.predict(X_test)

residuals = np.abs(y_train - y_hat_train)

logit_model=sm.Logit(y_train,X_train)
result=logit_model.fit()
print(result.summary())

print(pd.Series(residuals).value_counts())
print(pd.Series(residuals).value_counts(normalize=True))

## Output predictions to new dataset

test=pd.read_csv(r'link')

predictions = logreg.predict(X_test)


test_predictions = logreg.predict(test.drop('OpenedLCInd', axis = 1))
                                
test["predictions"] = test_predictions

test.to_csv(r'output link')

Answer 1

您不使用驗證集（上面代碼中的測試集）。 要修復它，請讓residuals = np.abs(y_test - y_hat_test)而不是使用y_train 。

此外，應用交叉驗證以確保 model 始終保持良好狀態也很有用。

邏輯回歸 Model in Python 具有良好的准確度和精確度，但預測還差得很遠

問題描述

1 個解決方案

解決方案1
2 已采納 2021-05-14 22:54:07

邏輯回歸 Model in Python 具有良好的准確度和精確度，但預測還差得很遠

問題描述

1 個解決方案

解決方案1 2 已采納 2021-05-14 22:54:07

解決方案1
2 已采納 2021-05-14 22:54:07