简体   繁体   中英

Result being vector values from ctree classification rather than scalar

After I set my response variable as a factor by doing as.factor(response) , and I run:

tree = ctree(response~., data=trainingset)

When I plot this tree: it gives me vector values for y in the graph as an example: y=(0.095, 0.905, 0) I noticed that the 3 values sum up to 1.

But as a matter fact that the actual response variables consist values of 0, 1, 99 only.

Can anyone help me interpret this vector in ctree plot please? Thank you!

In terms of specific code, it looks like the following:

response = as.factor(data$response) 
newdata = cbind(predictor.matrix, response)

ind = sample(2, nrow(newdata), replace=TRUE, prob=c(0.7, 0.3))
trainData = newdata[ind==1,]
testData = newdata[ind==2,]

tree = ctree(response~., data=trainData)
plot(tree, type="simple")

Those are posterior probabilities for each of your classes; ie the posterior probability for that node is ~0.9 (90%) for class 1 (assuming your levels for the factor are in the order c(0, 1, 99) .

In practical sense, this means that ~90% of the observations in that node are of class 1 , ~5% are class 0 and none of the observations were of class 99 .

What I think is throwing you is that your classes are numeric levels and the plot had posterior probabilities, also numeric. If we look at an example from the party package where the response is a factor with character levels, hopefully you'll understand the plot and outputs from the tree better.

From ?ctree

library("party")
irisct <- ctree(Species ~ ., data = iris)
irisct

R> irisct

     Conditional inference tree with 4 terminal nodes

Response:  Species 
Inputs:  Sepal.Length, Sepal.Width, Petal.Length, Petal.Width 
Number of observations:  150 

1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
  2)*  weights = 50 
1) Petal.Length > 1.9
  3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
    4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
      5)*  weights = 46 
    4) Petal.Length > 4.8
      6)*  weights = 8 
  3) Petal.Width > 1.7
    7)*  weights = 46

Here, Species is a factor variable with levels

R> with(iris, levels(Species))
[1] "setosa"     "versicolor" "virginica"

Plotting the tree shows the numeric posterior probabilities in the terminal nodes:

plot(irisct, type = "simple")

在此处输入图片说明

A more informative plot though is:

plot(irisct)

在此处输入图片说明

As this makes it clear that each node has a number of observations from one or more classes. Which is how the posterior probabilities are worked out.

Predictions from the tree are given by the predict() method

predict(irisct)

R> predict(irisct)
  [1] setosa     setosa     setosa     setosa     setosa     setosa    
  [7] setosa     setosa     setosa     setosa     setosa     setosa    
 [13] setosa     setosa     setosa     setosa     setosa     setosa
....

You can objtain the posterioro probabilities for each obsevration via the treeresponse function

R> treeresponse(irisct)[145:150]
[[1]]
[1] 0.00000 0.02174 0.97826

[[2]]
[1] 0.00000 0.02174 0.97826

[[3]]
[1] 0.00000 0.02174 0.97826

[[4]]
[1] 0.00000 0.02174 0.97826

[[5]]
[1] 0.00000 0.02174 0.97826

[[6]]
[1] 0.00000 0.02174 0.97826

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM