R腳本：用於二進制分類的xgboost-如何獲取預測的標簽？

Question

我正在嘗試使用XGBoost進行二進制分類，而新手遇到了問題。

首先，我訓練了模型“適合”：

fit <- xgboost(
    data = dtrain #as.matrix(dat[,predictors])
    , label = label 
    #, eta = 0.1                        # step size shrinkage 
    #, max_depth = 25                   # maximum depth of tree 
    , nround=100
    #, subsample = 0.5
    #, colsample_bytree = 0.5           # part of data instances to grow tree
    #, seed = 1
    , eval_metric = "merror"        # or "mlogloss" - evaluation metric 
    , objective = "binary:logistic" #we will train a binary classification model using logistic regression for classification; anoter options: "multi:softprob", "multi:softmax" = multi class classification
    , num_class = 2                 # Number of classes in the dependent variable.
    #, nthread = 3                  # number of threads to be used 
    #, silent = 1
    #, prediction=T
)

然后我嘗試使用該模型對新測試數據的標簽進行預測.frame：預測=預測（fit，as.matrix（test））print（str（predictions））

結果，我得到的單概率值是測試數據中的2倍。

num [1：62210] 0.0567 0.0455 0.023 0.0565 0.0642 ...

我讀到，由於我正在使用二進制分類，因此對於測試data.frame中的每一行，我都會得到2個概率：label1和label2。 但是如何與我的data.frame“測試”一起加入那個預測列表（或者那個預測對象的類型是什么？）“預測”並獲得最高概率的預測呢？ 我試圖重新整理“預測”和“測試”，但是在合並的data.frame中得到了62k行（而不是最初的“ test”中的31k）。 請告訴我，如何獲得每一行的預測？

第二個問題：隨着我“預測”“測試” data.frame中每一行的2個概率（對於label1和label2），我期望這兩個值的總和應為1。 1個測試行我得到2個小值：0.0455073267221451 0.0621210783720016它們的總和遠小於1 ...為什么會這樣？

請給我解釋一下這兩件事。 我試過了，但是沒有找到明確解釋的相關主題...

Answer 1

您首先需要創建測試集，即一個矩陣，在該矩陣中，訓練部分使用了p列，而沒有“結果”變量（模型的y ）。

保持矢量as.numeric測試集（真相）的標簽。

然后，這只是幾個說明。 我建議對confusionMatrix函數使用caret 。

library(caret)
library(xgboost)

test_matrix <- data.matrix(test[, -"outcome")]) # your test matrix (without the labels)
test_labels <- as.numeric(test$outcome) # the test labels
xgb_pred <- predict(fit, test_matrix) # this will give you just one probability (it will be a simple vector)
xgb_pred_class <- as.numeric(xgb_pred > 0.50) # to get your predicted labels 
# keep in mind that 0.50 is a threshold that can be modified.

confusionMatrix(as.factor(xgb_pred_class), as.factor(test_labels))
# this will get your confusion Matrix

R腳本：用於二進制分類的xgboost-如何獲取預測的標簽？

問題描述

1 個解決方案

解決方案1
0 2018-07-31 07:29:41

R腳本：用於二進制分類的xgboost-如何獲取預測的標簽？

問題描述

1 個解決方案

解決方案1 0 2018-07-31 07:29:41

解決方案1
0 2018-07-31 07:29:41