After I set my response variable as a factor by doing as.factor(response)
, and I run:
tree = ctree(response~., data=trainingset)
When I plot this tree: it gives me vector values for y in the graph as an example: y=(0.095, 0.905, 0) I noticed that the 3 values sum up to 1.
But as a matter fact that the actual response variables consist values of 0, 1, 99 only.
Can anyone help me interpret this vector in ctree plot please? Thank you!
In terms of specific code, it looks like the following:
response = as.factor(data$response)
newdata = cbind(predictor.matrix, response)
ind = sample(2, nrow(newdata), replace=TRUE, prob=c(0.7, 0.3))
trainData = newdata[ind==1,]
testData = newdata[ind==2,]
tree = ctree(response~., data=trainData)
plot(tree, type="simple")
Those are posterior probabilities for each of your classes; ie the posterior probability for that node is ~0.9 (90%) for class 1
(assuming your levels for the factor are in the order c(0, 1, 99)
.
In practical sense, this means that ~90% of the observations in that node are of class 1
, ~5% are class 0
and none of the observations were of class 99
.
What I think is throwing you is that your classes are numeric levels and the plot had posterior probabilities, also numeric. If we look at an example from the party package where the response is a factor with character levels, hopefully you'll understand the plot and outputs from the tree better.
From ?ctree
library("party")
irisct <- ctree(Species ~ ., data = iris)
irisct
R> irisct
Conditional inference tree with 4 terminal nodes
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
Number of observations: 150
1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
2)* weights = 50
1) Petal.Length > 1.9
3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
5)* weights = 46
4) Petal.Length > 4.8
6)* weights = 8
3) Petal.Width > 1.7
7)* weights = 46
Here, Species
is a factor variable with levels
R> with(iris, levels(Species))
[1] "setosa" "versicolor" "virginica"
Plotting the tree shows the numeric posterior probabilities in the terminal nodes:
plot(irisct, type = "simple")
A more informative plot though is:
plot(irisct)
As this makes it clear that each node has a number of observations from one or more classes. Which is how the posterior probabilities are worked out.
Predictions from the tree are given by the predict()
method
predict(irisct)
R> predict(irisct)
[1] setosa setosa setosa setosa setosa setosa
[7] setosa setosa setosa setosa setosa setosa
[13] setosa setosa setosa setosa setosa setosa
....
You can objtain the posterioro probabilities for each obsevration via the treeresponse
function
R> treeresponse(irisct)[145:150]
[[1]]
[1] 0.00000 0.02174 0.97826
[[2]]
[1] 0.00000 0.02174 0.97826
[[3]]
[1] 0.00000 0.02174 0.97826
[[4]]
[1] 0.00000 0.02174 0.97826
[[5]]
[1] 0.00000 0.02174 0.97826
[[6]]
[1] 0.00000 0.02174 0.97826
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.