簡體   English   中英

"Scikit Learn:邏輯回歸模型系數:澄清"

[英]Scikit Learn: Logistic Regression model coefficients: Clarification

我需要知道如何以我可以自己生成預測概率的方式返回邏輯回歸系數。

我的代碼如下所示:

lr = LogisticRegression()
lr.fit(training_data, binary_labels)

# Generate probabities automatically
predicted_probs = lr.predict_proba(binary_labels)

看看文檔( http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html ),lr.coef_ 不存儲偏移系數

coef_ array, shape = [n_classes-1, n_features] 決策函數中特征的系數。 coef_ 是從 raw_coef_ 派生的只讀屬性,它遵循 liblinear 的內部存儲器布局。 截距數組,形狀 = [n_classes-1] 截距(又名偏差)添加到決策函數中。 僅當參數intercept 設置為True 時才可用。

嘗試:

sigmoid( dot([val1, val2], lr.coef_) + lr.intercept_ ) 

最簡單的方法是調用LR分類器的coef_屬性:

coef_定義請查看Scikit-Learn 文檔

見示例:

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()  
clf.fit(x_train,y_train)  

weight = classifier.coef_  
Option D is corrct i think!

4.2 ! PROVIDED CODE !

tmp = np.load("./data/part-2/TFIDF.npz")
TFIDF_commentary = tmp['commentary']
TFIDF_questions = tmp['questions']
X_comm=TFIDF_commentary 
X_quest = TFIDF_questions

#I am not sure about this 
y_comm =commentary['gender'].values
y_quest =questions['gender'].values

# Train test split

#first model
X_train_comm, X_test_comm, y_train_comm, y_test_comm = train_test_split(X_comm, y_comm, 
                                                    test_size=0.4, 
                                                    random_state=42)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
cl_comm = LogisticRegression(max_iter=2000,C=10)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)
#second model
X_train_quest, X_test_quest, y_train_quest, y_test_quest = train_test_split(X_quest, y_quest, 
                                                    test_size=0.40, 
                                                    random_state=42)


cl_quest = LogisticRegression(max_iter=2000,C=10)
cl_quest.fit(X_train_quest,y_train_quest)


y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)```


#3. False, predicting commentary text with is almost random, while for quest we can have 70% accuracy.

#4 lets use c=2000

#comm
cl_comm = LogisticRegression(max_iter=2000,C=2000)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)


#quest
cl_quest = LogisticRegression(max_iter=2000,C=2000)
cl_quest.fit(X_train_quest,y_train_quest)

y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)```


#We put less penalty, thus the comm model may starts overfitting because it can be more complex, while on the quest it lowers the accuracy.

#TASK5
#clean gender
questions['gender'].replace(['M','F'],
[0, 1], inplace=True)
import statsmodels.api as sm
import statsmodels.formula.api as smf
#linear regression
mod = smf.ols(formula='gender ~ similarity', data=questions)
res = mod.fit()
print(res.summary())


#I think is wrong since the similarity coefficieent tends to -1 while the label for the woman is 1
#thus kind of strongly decorellated from it

#but i am not sure
tmp = np.load("./data/part-2/TFIDF.npz")
TFIDF_commentary = tmp['commentary']
TFIDF_questions = tmp['questions']

X_comm=TFIDF_commentary 
X_quest = TFIDF_questions


#I am not sure about this 
y_comm =commentary['gender'].values
y_quest =questions['gender'].values

# Train test split

#first model
X_train_comm, X_test_comm, y_train_comm, y_test_comm = train_test_split(X_comm, y_comm, 
                                                    test_size=0.4, 
                                                    random_state=42)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
cl_comm = LogisticRegression(max_iter=2000,C=10)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)


#second model
X_train_quest, X_test_quest, y_train_quest, y_test_quest = train_test_split(X_quest, y_quest, 
                                                    test_size=0.40, 
                                                    random_state=42)


cl_quest = LogisticRegression(max_iter=2000,C=10)
cl_quest.fit(X_train_quest,y_train_quest)

y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)

#3. False, predicting commentary text with is almost random, while for quest we can have 70% accuracy.

#4 lets use c=2000

#comm
cl_comm = LogisticRegression(max_iter=2000,C=2000)
cl_comm.fit(X_train_comm,y_train_comm)

y_pred_comm = cl_comm.predict(X_test_comm)
accuracy_score(y_test_comm, y_pred_comm)


#quest
cl_quest = LogisticRegression(max_iter=2000,C=2000)
cl_quest.fit(X_train_quest,y_train_quest)

y_pred_quest = cl_quest.predict(X_test_quest)
accuracy_score(y_test_quest, y_pred_quest)```


#We put less penalty, thus the comm model may starts overfitting because it can be more complex, while on the quest it lowers the accuracy.

#TASK5
#clean gender
questions['gender'].replace(['M','F'],
[0, 1], inplace=True)
import statsmodels.api as sm
import statsmodels.formula.api as smf
#linear regression
mod = smf.ols(formula='gender ~ similarity', data=questions)
res = mod.fit()
print(res.summary())

#I think is wrong since the similarity coefficieent tends to -1 while the label for the woman is 1
#thus kind of strongly decorellated from it

#but i am not sure

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM