"Scikit Learn：邏輯回歸模型系數：澄清"

Question

我需要知道如何以我可以自己生成預測概率的方式返回邏輯回歸系數。

我的代碼如下所示：

lr = LogisticRegression()
lr.fit(training_data, binary_labels)

# Generate probabities automatically
predicted_probs = lr.predict_proba(binary_labels)

Answer 1

看看文檔（ http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html ），lr.coef_ 不存儲偏移系數

coef_ array, shape = [n_classes-1, n_features] 決策函數中特征的系數。 coef_ 是從 raw_coef_ 派生的只讀屬性，它遵循 liblinear 的內部存儲器布局。 截距數組，形狀 = [n_classes-1] 截距（又名偏差）添加到決策函數中。 僅當參數intercept 設置為True 時才可用。

嘗試：

sigmoid( dot([val1, val2], lr.coef_) + lr.intercept_ )

Answer 2

最簡單的方法是調用LR分類器的coef_屬性：

coef_定義請查看Scikit-Learn 文檔：

見示例：

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()  
clf.fit(x_train,y_train)  

weight = classifier.coef_

Answer 3

Option D is corrct i think!

4.2 ! PROVIDED CODE !

tmp = np.load("./data/part-2/TFIDF.npz")
TFIDF_commentary = tmp['commentary']
TFIDF_questions = tmp['questions']
X_comm=TFIDF_commentary 
X_quest = TFIDF_questions

#I am not sure about this 
y_comm =commentary['gender'].values
y_quest =questions['gender'].values

# Train test split

#first model
X_train_comm, X_test_comm, y_train_comm, y_test_comm = train_test_split(X_comm, y_comm, 
                                                    test_size=0.4, 
                                                    random_state=42)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
cl_comm = LogisticRegression(max_iter=2000,C=10)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)
#second model
X_train_quest, X_test_quest, y_train_quest, y_test_quest = train_test_split(X_quest, y_quest, 
                                                    test_size=0.40, 
                                                    random_state=42)


cl_quest = LogisticRegression(max_iter=2000,C=10)
cl_quest.fit(X_train_quest,y_train_quest)


y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)```


#3. False, predicting commentary text with is almost random, while for quest we can have 70% accuracy.

#4 lets use c=2000

#comm
cl_comm = LogisticRegression(max_iter=2000,C=2000)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)


#quest
cl_quest = LogisticRegression(max_iter=2000,C=2000)
cl_quest.fit(X_train_quest,y_train_quest)

y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)```


#We put less penalty, thus the comm model may starts overfitting because it can be more complex, while on the quest it lowers the accuracy.

#TASK5
#clean gender
questions['gender'].replace(['M','F'],
[0, 1], inplace=True)
import statsmodels.api as sm
import statsmodels.formula.api as smf
#linear regression
mod = smf.ols(formula='gender ~ similarity', data=questions)
res = mod.fit()
print(res.summary())


#I think is wrong since the similarity coefficieent tends to -1 while the label for the woman is 1
#thus kind of strongly decorellated from it

#but i am not sure
tmp = np.load("./data/part-2/TFIDF.npz")
TFIDF_commentary = tmp['commentary']
TFIDF_questions = tmp['questions']

X_comm=TFIDF_commentary 
X_quest = TFIDF_questions


#I am not sure about this 
y_comm =commentary['gender'].values
y_quest =questions['gender'].values

# Train test split

#first model
X_train_comm, X_test_comm, y_train_comm, y_test_comm = train_test_split(X_comm, y_comm, 
                                                    test_size=0.4, 
                                                    random_state=42)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
cl_comm = LogisticRegression(max_iter=2000,C=10)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)


#second model
X_train_quest, X_test_quest, y_train_quest, y_test_quest = train_test_split(X_quest, y_quest, 
                                                    test_size=0.40, 
                                                    random_state=42)


cl_quest = LogisticRegression(max_iter=2000,C=10)
cl_quest.fit(X_train_quest,y_train_quest)

y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)

#3. False, predicting commentary text with is almost random, while for quest we can have 70% accuracy.

#4 lets use c=2000

#comm
cl_comm = LogisticRegression(max_iter=2000,C=2000)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)


#quest
cl_quest = LogisticRegression(max_iter=2000,C=2000)
cl_quest.fit(X_train_quest,y_train_quest)

y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)```


#We put less penalty, thus the comm model may starts overfitting because it can be more complex, while on the quest it lowers the accuracy.

#TASK5
#clean gender
questions['gender'].replace(['M','F'],
[0, 1], inplace=True)
import statsmodels.api as sm
import statsmodels.formula.api as smf
#linear regression
mod = smf.ols(formula='gender ~ similarity', data=questions)
res = mod.fit()
print(res.summary())

#I think is wrong since the similarity coefficieent tends to -1 while the label for the woman is 1
#thus kind of strongly decorellated from it

#but i am not sure

"Scikit Learn：邏輯回歸模型系數：澄清"

問題描述

2 個解決方案

解決方案1
28 已采納 2013-09-24 23:42:50

解決方案2
1 2021-04-15 05:28:44

解決方案3
0 2022-01-25 09:15:19

"Scikit Learn：邏輯回歸模型系數：澄清"

問題描述

2 個解決方案

解決方案1 28 已采納 2013-09-24 23:42:50

解決方案2 1 2021-04-15 05:28:44

解決方案3 0 2022-01-25 09:15:19

解決方案1
28 已采納 2013-09-24 23:42:50

解決方案2
1 2021-04-15 05:28:44

解決方案3
0 2022-01-25 09:15:19