简体   繁体   English

R 中的简单决策树 - 插入符号 Package 的奇怪结果

[英]Simple Decision Tree in R - Strange Results From Caret Package

I'm trying to apply a simple decision tree to the following data set using the caret package, the data is:我正在尝试使用插入符号 package 将简单的决策树应用于以下数据集,数据为:

> library(caret)
> mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
> mydata$rank <- factor(mydata$rank)
  # create dummy variables
> X = predict(dummyVars(~ ., data=mydata), mydata)
> head(X)

    A matrix: 6 × 7 of type dbl     
admit   gre gpa rank.1  rank.2  rank.3  rank.4
    0   380 3.61    0        0        1      0
    1   660 3.67    0        0        1      0
    1   800 4.00    1        0        0      0
    1   640 3.19    0        0        0      1
    0   520 2.93    0        0        0      1
    1   760 3.00    0        1        0      0

Splitting into a training and testing set:拆分为训练和测试集:

> trainset <- data.frame(X[1:300,])
> testset <- data.frame(X[301:400,])

Now applying the decision tree:现在应用决策树:

> tree <- train(factor(admit) ~., data = trainset, method = "rpart")
> tree

CART 

300 samples
  6 predictor
  2 classes: '0', '1' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 300, 300, 300, 300, 300, 300, ... 
Resampling results across tuning parameters:

 cp          Accuracy   Kappa    
0.01956522  0.6856163  0.1865179
0.03260870  0.6888378  0.1684015
0.08695652  0.7080434  0.1079462

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.08695652.

I get NaN in variable importance?我得到NaN的重要性不定? Why?为什么?

> varImp(tree)$importance

A data.frame: 6 × 1     Overall
<dbl>
gre NaN
gpa NaN
rank.1  NaN
rank.2  NaN
rank.3  NaN
rank.4  NaN

and in prediction the decision tree only outputs one class, the 0 class, why?在预测中,决策树只输出一个 class,0 class,为什么? What's wrong with my code?我的代码有什么问题? Thanks in advance.提前致谢。

> y_pred <- predict(tree ,newdata=testset)
> y_test <- factor(testset$admit)
> confusionMatrix(y_pred, factor(y_test))

Confusion Matrix and Statistics

      Reference
Prediction  0  1
         0 65 35
         1  0  0

           Accuracy : 0.65            
             95% CI : (0.5482, 0.7427)
No Information Rate : 0.65            
P-Value [Acc > NIR] : 0.5458          

              Kappa : 0               

Mcnemar's Test P-Value : 9.081e-09       

        Sensitivity : 1.00            
        Specificity : 0.00            
     Pos Pred Value : 0.65            
     Neg Pred Value :  NaN            
         Prevalence : 0.65            
     Detection Rate : 0.65            
 Detection Prevalence : 1.00            
  Balanced Accuracy : 0.50            

   'Positive' Class : 0           

I can't answer your question, but I can show you the way I use to calculate decision trees:我无法回答您的问题,但我可以向您展示我用来计算决策树的方式:

library(data.table)
library(tidyverse)
library(caret)
library(rpart)
library(rpart.plot)

# Reading data into data.table
mydata <- fread("https://stats.idre.ucla.edu/stat/data/binary.csv")

# converting rank and admit to factors
mydata$rank  <- as.factor(mydata$rank)
mydata$admit <- as.factor(mydata$admit)

# creating train and test data
t_index  <- createDataPartition(mydata$admit, p=0.75, list=FALSE)
trainset <- mydata[t_index,]
testset  <- mydata[-t_index,]

# calculating the model using rpart
model <- rpart(admit ~ .,
               data = trainset,
               parms = list(split="information"),
               method = "class")

# plotting the decision tree
model %>%
  rpart.plot(digits = 4)

# get confusion matrix
model %>%
  predict(testset, type = "class") %>%
  table(testset$admit) %>%
  confusionMatrix()

Perhaps this helps you a bit.也许这对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM