简体   繁体   English

将rpart模型转换为PMML(使用“ pmml”包)

[英]Converting rpart model into PMML (using 'pmml' package)

I get the following error when I try to convert my rpart model into a pmml : 当我尝试将我的rpart模型转换为pmml时,出现以下错误:

Fehler in if (ff$nsurrogate[parent_ii] > 0) { :   
    Fehlender Wert, wo TRUE/FALSE nötig ist
    (Missing value where TRUE / FALSE is needed)

This error can be reproduced by the code bellow: 此错误可以通过下面的代码重现:

library(rpart)
library(pmml)
df <- structure(list(a = structure(c(15L, 1L, 13L, 8L, 11L, 25L, 6L, 
                                     24L, 27L, 9L, 2L, 18L, 28L, 14L, 5L, 17L, 20L, 21L, 16L, 7L, 
                                     22L, 19L, 23L, 26L, 3L, 10L, 12L, 4L), .Label = c("013", "018", 
                                                                                       "063", "073", "122", "173", "212", "216", "296", "355", "410", 
                                                                                       "415", "423", "428", "453", "481", "534", "586", "678", "701", 
                                                                                       "735", "746", "778", "812", "818", "855", "864", "998"), class = "factor"), 
                     y = c(1.029993, 0.95987, 0.95987, 0.95987, 0.95987, 0.95987, 
                           0.95987, 0.969903, 0.95987, 0.860644, 0.95987, 0.969903, 
                           0.900669, 0.95987, 0.95987, 0.95987, 1.12018, 0.95987, 0.95987, 
                           0.95987, 0.95987, 0.880656, 0.95987, 0.939858, 0.95987, 0.939858, 
                           0.95987, 0.95987)), row.names = c(NA, -28L), class = "data.frame")

model <- rpart(y ~ a, df, control = rpart.control(minsplit = 1, minbucket = 2, cp=-1))
pmml.rpart(model)

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux sessionInfo()R版本3.5.1(2018-07-02)平台:x86_64-redhat-linux-gnu(64位)运行在:Red Hat Enterprise Linux下

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so 矩阵产品:默认BLAS / LAPACK:/usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 区域设置:[1] LC_CTYPE = de_DE.UTF-8 LC_NUMERIC = C LC_TIME = de_DE.UTF-8 LC_COLLATE = de_DE.UTF-8 LC_MONETARY = de_DE.UTF-8
[6] LC_MESSAGES=de_DE.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [6] LC_MESSAGES = de_DE.UTF-8 LC_PAPER = de_DE.UTF-8 LC_NAME = C LC_ADDRESS = C LC_TELEPHONE = C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C [11] LC_MEASUREMENT = de_DE.UTF-8 LC_IDENTIFICATION = C

attached base packages: [1] stats graphics grDevices utils datasets methods base 附加的基本软件包:[1]统计信息图形grDevices utils数据集方法基础

other attached packages: [1] pmml_1.5.7 XML_3.98-1.16 rpart_4.1-13 其他附加软件包:[1] pmml_1.5.7 XML_3.98-1.16 rpart_4.1-13

loaded via a namespace (and not attached): [1] compiler_3.5.1 magrittr_1.5 tools_3.5.1 yaml_2.2.0 stringi_1.2.4 stringr_1.3.1 通过名称空间(且未附加)加载:[1] editor_3.5.1 magrittr_1.5 tools_3.5.1 yaml_2.2.0 stringi_1.2.4 stringr_1.3.1

Currently df$a is a factor, which doesn't really make sense given that the number of rows equals the number of factor levels. 当前df$a是一个因子,考虑到行数等于因子级别数,这实际上并没有任何意义。 Fixing that with 用修复

df$a <- as.numeric(as.character(df$a))

also allows to run 也可以运行

pmml.rpart(model)

Consider using the r2pmml package instead: https://github.com/jpmml/r2pmml 考虑改用r2pmml软件包: https : //github.com/jpmml/r2pmml

The conversion succeeds with the above code as-is, and the generated PMML model file is smaller, cleaner and provably correct: 转换按原样成功完成上述代码,并且生成的PMML模型文件更小,更干净并且可证明是正确的:

library("r2pmml")
r2pmml(model, "model.pmml")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM