简体   繁体   English

无法使用“ party”包在R中实现决策树。 怎么做?

[英]Can't implement Decision tree in R using 'party' package. How to do it?

I am trying to construct decision tree in R using the "party" package, 我正在尝试使用“ party”包在R中构造决策树,

I am following the approach mentioned on http://www.rdatamining.com/examples/decision-tree 我正在遵循http://www.rdatamining.com/examples/decision-tree上提到的方法

in which they have shown decision tree using the "party" package. 他们使用“ party”包显示了决策树。

My dataset is similar to the iris dataset shown in the example . 我的数据集类似于示例中显示的虹膜数据集。 here is a link to the copy of my dataset. 这是指向我的数据集副本的链接。 https://drive.google.com/file/d/0B6cqWmwsEk20TXQyMnVlbGppcTQ/edit?usp=sharing https://drive.google.com/file/d/0B6cqWmwsEk20TXQyMnVlbGppcTQ/edit?usp=sharing

here is the code that I tried. 这是我尝试过的代码。 I loaded the data using read.csv command and fed it to dat3 variable. 我使用read.csv命令加载了数据,并将其提供给dat3变量。

library(party)
> str(dat3)
'data.frame':   1000 obs. of  4 variables:
 $ Road_Type              : num  2 3 3 1 1 1 3 3 1 3 ...
 $ Light_Conditions       : num  2 3 3 3 3 3 3 3 3 3 ...
 $ Road_Surface_Conditions: num  1 2 2 2 2 2 2 2 2 2 ...
 $ Accident_Severity      : chr  "three" "three" "three" "three" ...
> dat3$Accident_Severity<-as.factor(dat3$Accident_Severity)
> str(dat3)
'data.frame':   1000 obs. of  4 variables:
 $ Road_Type              : num  2 3 3 1 1 1 3 3 1 3 ...
 $ Light_Conditions       : num  2 3 3 3 3 3 3 3 3 3 ...
 $ Road_Surface_Conditions: num  1 2 2 2 2 2 2 2 2 2 ...
 $ Accident_Severity      : Factor w/ 3 levels "one","three",..: 2 2 2 2 3 2 2 2 3 2 ...
> mytree<- ctree(Accident_Severity ~ Road_Type + Light_Conditions + Road_Surface_Conditions, data=dat3)
> print(mytree)

     Conditional inference tree with 1 terminal nodes

Response:  Accident_Severity 
Inputs:  Road_Type, Light_Conditions, Road_Surface_Conditions 
Number of observations:  1000 

1)*  weights = 1000 
> 

As you can see the tree constructed has no node and when I plot this tree graphically then also the results as not as desired as no tree is constructed. 如您所见,构建的树没有节点,当我以图形方式绘制该树时,结果也不会像没有构建树那样不理想。 I am not sure what I am doing wrong here. 我不确定我在做什么错。

I don't think there is enough information in the data to do anything at the 0.95 level of significance. 我认为数据中没有足够的信息来执行0.95的显着性水平。 Look at a tabular split: 查看表格拆分:

> with( dat3, table(Accident_Severity, Light_Conditions, Road_Type))
, , Road_Type = 1

                 Light_Conditions
Accident_Severity   1   2   3
            one     0   2   4
            three   2 157 158
            two     0  14  35

, , Road_Type = 2

                 Light_Conditions
Accident_Severity   1   2   3
            one     0   0   0
            three   1  17  11
            two     0   0   0

, , Road_Type = 3

                 Light_Conditions
Accident_Severity   1   2   3
            one     0   2   2
            three   3 269 251
            two     0  38  34

So there is no split that isn't obvious I suppose. 因此,我认为没有显而易见的分歧。 The function thinks it is already sufficiently split. 该函数认为它已经被充分分割。 If you lower the min-criterion you get splits: 如果降低最低标准,则会产生分裂:

 mytree<- ctree(Accident_Severity ~ Road_Type + Light_Conditions + Road_Surface_Conditions, 
                  data=dat3, control=ctree_control(  mincriterion =0.50) )
 print(mytree)
#----------------------
     Conditional inference tree with 4 terminal nodes

Response:  Accident_Severity 
Inputs:  Road_Type, Light_Conditions, Road_Surface_Conditions 
Number of observations:  1000 

1) Light_Conditions <= 2; criterion = 0.653, statistic = 4.043
  2) Road_Surface_Conditions <= 1; criterion = 0.9, statistic = 6.742
    3)*  weights = 193 
  2) Road_Surface_Conditions > 1
    4)*  weights = 312 
1) Light_Conditions > 2
  5) Road_Type <= 1; criterion = 0.792, statistic = 5.187
    6)*  weights = 197 
  5) Road_Type > 1
    7)*  weights = 298 

plot(mytree)

在此处输入图片说明

If you use factor() around the variable names they are handles as non-ordinal: 如果在变量名称周围使用factor(),则它们将作为非普通变量进行处理:

 mytree2 <- ctree(Accident_Severity ~ factor(Road_Type) + factor(Light_Conditions) + factor(Road_Surface_Conditions), 
                   data=dat3, control=ctree_control(  mincriterion =0.50) )
  print(mytree2)
#------------------------
     Conditional inference tree with 2 terminal nodes

Response:  Accident_Severity 
Inputs:  factor(Road_Type), factor(Light_Conditions), factor(Road_Surface_Conditions) 
Number of observations:  1000 

1) factor(Road_Type) == {1, 3}; criterion = 0.635, statistic = 6.913
  2)*  weights = 971 
1) factor(Road_Type) == {2}
  3)*  weights = 29 

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM