如何為scikit學習隨機森林模型設置閾值

Question

在看到precision_recall_curve之后，如果我想設置threshold = 0.4，如何在我的隨機森林模型中實現0.4（二元分類），對於任何概率<0.4，將其標記為0，對於任何> = 0.4，將其標記為1。

from sklearn.ensemble import RandomForestClassifier
  random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
  random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
  predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)

文檔精確召回

Answer 1

假設您正在進行二進制分類，這很容易：

threshold = 0.4

predicted_proba = random_forest.predict_proba(X_test)
predicted = (predicted_proba [:,1] >= threshold).astype('int')

accuracy = accuracy_score(y_test, predicted)

Answer 2

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)

threshold = 0.4

predicted = random_forest.predict_proba(X_test)
predicted[:,0] = (predicted[:,0] < threshold).astype('int')
predicted[:,1] = (predicted[:,1] >= threshold).astype('int')


accuracy = accuracy_score(y_test, predicted)
print(round(accuracy,4,)*100, "%")

這帶有一個錯誤，指的是最后一個精度部分“ValueError：無法處理二進制和多標簽指示符的混合”

Answer 3

sklearn.metrics.accuracy_score采用 1 d 數組，但您的預測數組是 2 d。 這會帶來一個錯誤。
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

如何為scikit學習隨機森林模型設置閾值

問題描述

3 個解決方案

解決方案1
25 已采納 2018-04-12 10:07:03

解決方案2
1 2018-04-12 18:15:00

解決方案3
0 2021-03-02 13:54:17

如何為scikit學習隨機森林模型設置閾值

問題描述

3 個解決方案

解決方案1 25 已采納 2018-04-12 10:07:03

解決方案2 1 2018-04-12 18:15:00

解決方案3 0 2021-03-02 13:54:17

解決方案1
25 已采納 2018-04-12 10:07:03

解決方案2
1 2018-04-12 18:15:00

解決方案3
0 2021-03-02 13:54:17