I am trying to construct decision tree in R using the "party" package,
I am following the approach mentioned on http://www.rdatamining.com/examples/decision-tree
in which they have shown decision tree using the "party" package.
My dataset is similar to the iris dataset shown in the example . here is a link to the copy of my dataset. https://drive.google.com/file/d/0B6cqWmwsEk20TXQyMnVlbGppcTQ/edit?usp=sharing
here is the code that I tried. I loaded the data using read.csv command and fed it to dat3 variable.
library(party)
> str(dat3)
'data.frame': 1000 obs. of 4 variables:
$ Road_Type : num 2 3 3 1 1 1 3 3 1 3 ...
$ Light_Conditions : num 2 3 3 3 3 3 3 3 3 3 ...
$ Road_Surface_Conditions: num 1 2 2 2 2 2 2 2 2 2 ...
$ Accident_Severity : chr "three" "three" "three" "three" ...
> dat3$Accident_Severity<-as.factor(dat3$Accident_Severity)
> str(dat3)
'data.frame': 1000 obs. of 4 variables:
$ Road_Type : num 2 3 3 1 1 1 3 3 1 3 ...
$ Light_Conditions : num 2 3 3 3 3 3 3 3 3 3 ...
$ Road_Surface_Conditions: num 1 2 2 2 2 2 2 2 2 2 ...
$ Accident_Severity : Factor w/ 3 levels "one","three",..: 2 2 2 2 3 2 2 2 3 2 ...
> mytree<- ctree(Accident_Severity ~ Road_Type + Light_Conditions + Road_Surface_Conditions, data=dat3)
> print(mytree)
Conditional inference tree with 1 terminal nodes
Response: Accident_Severity
Inputs: Road_Type, Light_Conditions, Road_Surface_Conditions
Number of observations: 1000
1)* weights = 1000
>
As you can see the tree constructed has no node and when I plot this tree graphically then also the results as not as desired as no tree is constructed. I am not sure what I am doing wrong here.
I don't think there is enough information in the data to do anything at the 0.95 level of significance. Look at a tabular split:
> with( dat3, table(Accident_Severity, Light_Conditions, Road_Type))
, , Road_Type = 1
Light_Conditions
Accident_Severity 1 2 3
one 0 2 4
three 2 157 158
two 0 14 35
, , Road_Type = 2
Light_Conditions
Accident_Severity 1 2 3
one 0 0 0
three 1 17 11
two 0 0 0
, , Road_Type = 3
Light_Conditions
Accident_Severity 1 2 3
one 0 2 2
three 3 269 251
two 0 38 34
So there is no split that isn't obvious I suppose. The function thinks it is already sufficiently split. If you lower the min-criterion you get splits:
mytree<- ctree(Accident_Severity ~ Road_Type + Light_Conditions + Road_Surface_Conditions,
data=dat3, control=ctree_control( mincriterion =0.50) )
print(mytree)
#----------------------
Conditional inference tree with 4 terminal nodes
Response: Accident_Severity
Inputs: Road_Type, Light_Conditions, Road_Surface_Conditions
Number of observations: 1000
1) Light_Conditions <= 2; criterion = 0.653, statistic = 4.043
2) Road_Surface_Conditions <= 1; criterion = 0.9, statistic = 6.742
3)* weights = 193
2) Road_Surface_Conditions > 1
4)* weights = 312
1) Light_Conditions > 2
5) Road_Type <= 1; criterion = 0.792, statistic = 5.187
6)* weights = 197
5) Road_Type > 1
7)* weights = 298
plot(mytree)
If you use factor() around the variable names they are handles as non-ordinal:
mytree2 <- ctree(Accident_Severity ~ factor(Road_Type) + factor(Light_Conditions) + factor(Road_Surface_Conditions),
data=dat3, control=ctree_control( mincriterion =0.50) )
print(mytree2)
#------------------------
Conditional inference tree with 2 terminal nodes
Response: Accident_Severity
Inputs: factor(Road_Type), factor(Light_Conditions), factor(Road_Surface_Conditions)
Number of observations: 1000
1) factor(Road_Type) == {1, 3}; criterion = 0.635, statistic = 6.913
2)* weights = 971
1) factor(Road_Type) == {2}
3)* weights = 29
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.