简体   繁体   English

如何使用 mlr 使用 package PMML 构建 XML 文件?

[英]How to build XML file with package PMML using mlr?

I want to convert a logistic model built by the -package directly into a XML-file using the package .我想使用 package 将 mlr -package 构建的逻辑直接转换为 XML 文件。 The problem is that the model.learner built by the mlr wrapper doesn't include the model link in the list, like it is in the normal stats::glm function.问题是由 mlr 包装器构建的 model.learner 不包括列表中的 model 链接,就像它在正常 stats::glm ZC1C425268E68385D1AB507Z4F1477 中一样So here is an example:所以这里有一个例子:

library(dplyr)
library(titanic)
library(pmml)
library(ParamHelpers)
library(mlr)

Titanic_data = select(titanic_train, Survived, Pclass, Sex, Age)
Titanic_data$Survived = as.factor(Titanic_data$Survived)
Titanic_data$Sex = as.factor(Titanic_data$Sex)
Titanic_data$Pclass = as.factor(Titanic_data$Pclass)
Titanic_data = na.omit(Titanic_data)

lrn <- makeLearner("classif.logreg", predict.type = "prob")
task = makeClassifTask(data = Titanic_data, target = "Survived", positive = "1")
model = train(lrn, task)

model_glm = glm(Survived ~ ., data = Titanic_data, family = "binomial")

str(model$learner.model)   # list of 29
str(model_glm)             # list of 30

As you can see, the structure of both models is a list of different elements and they are all the same, beside the fact that the model is missing in the wrapper.如您所见,两个模型的结构都是不同元素的列表,它们都是相同的,除了包装器中缺少 model 之外。 Therefore I get an error message using pmml:因此,我使用 pmml 收到错误消息:

pmml(model_glm)
# Error in pmml.glm(model$learner.model) : object 'model.link' not found

The one built by stats::glm is working:由 stats::glm 构建的那个正在工作:

pmml(model)

<PMML version="4.4" xmlns="http://www.dmg.org/PMML-4_4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_4 http://www.dmg.org/pmml/v4-4/pmml-4-4.xsd">
 <Header copyright="Copyright (c) 2020 TBeige" description="Generalized Linear Regression Model">
  <Extension name="user" value="TBeige" extender="SoftwareAG PMML Generator"/>
  <Application name="SoftwareAG PMML Generator" version="2.3.1"/>
  <Timestamp>2020-05-12 09:50:15</Timestamp>
 </Header>
 <DataDictionary numberOfFields="4">
  <DataField name="Survived" optype="categorical" dataType="string">
   <Value value="0"/>
   <Value value="1"/>
  </DataField>
  <DataField name="Pclass" optype="categorical" dataType="string">
   <Value value="1"/>
   <Value value="2"/>
   <Value value="3"/>
  </DataField>
  <DataField name="Sex" optype="categorical" dataType="string">
   <Value value="female"/>
   <Value value="male"/>
  </DataField>
  <DataField name="Age" optype="continuous" dataType="double"/>
 </DataDictionary>
 <GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="classification" algorithmName="glm" distribution="binomial" linkFunction="logit">
  <MiningSchema>
   <MiningField name="Survived" usageType="predicted" invalidValueTreatment="returnInvalid"/>
   <MiningField name="Pclass" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="Sex" usageType="active" invalidValueTreatment="returnInvalid"/>
   <MiningField name="Age" usageType="active" invalidValueTreatment="returnInvalid"/>
  </MiningSchema>
  <Output>
   <OutputField name="Probability_1" targetField="Survived" feature="probability" value="1" optype="continuous" dataType="double"/>
   <OutputField name="Predicted_Survived" feature="predictedValue" optype="categorical" dataType="string"/>
  </Output>
  <ParameterList>
   <Parameter name="p0" label="(Intercept)"/>
   <Parameter name="p1" label="Pclass2"/>
   <Parameter name="p2" label="Pclass3"/>
   <Parameter name="p3" label="Sexmale"/>
   <Parameter name="p4" label="Age"/>
  </ParameterList>
  <FactorList>
   <Predictor name="Pclass"/>
   <Predictor name="Sex"/>
  </FactorList>
  <CovariateList>
   <Predictor name="Age"/>
  </CovariateList>
  <PPMatrix>
   <PPCell value="2" predictorName="Pclass" parameterName="p1"/>
   <PPCell value="3" predictorName="Pclass" parameterName="p2"/>
   <PPCell value="male" predictorName="Sex" parameterName="p3"/>
   <PPCell value="1" predictorName="Age" parameterName="p4"/>
  </PPMatrix>
  <ParamMatrix>
   <PCell targetCategory="1" parameterName="p0" df="1" beta="3.77701265255885"/>
   <PCell targetCategory="1" parameterName="p1" df="1" beta="-1.30979926778885"/>
   <PCell targetCategory="1" parameterName="p2" df="1" beta="-2.58062531749203"/>
   <PCell targetCategory="1" parameterName="p3" df="1" beta="-2.52278091988034"/>
   <PCell targetCategory="1" parameterName="p4" df="1" beta="-0.0369852655754339"/>
  </ParamMatrix>
 </GeneralRegressionModel>
</PMML>

Any idea how I can use mlr and creating a xml find using pmml?知道如何使用 mlr 并使用 pmml 创建 xml find 吗?

The problem seems to be inside pmml问题似乎在pmml内部

From pmml::pmml.glm :来自pmml::pmml.glm

    if (model$call[[1]] == "glm") {                                                                                 
        model.type <- model$family$family                                                                           
        model.link <- model$family$link                                                                             
    }                                                                                                               
    else {                                                                                                          
        model.type <- "unknown"                                                                                     
    }

In the mlr model we have在 mlr model 我们有

model$learner.model$call[[1]]
# stats::glm

So you can just hack所以你可以破解

model$learner.model$call[[1]] = "glm"

and then接着

pmml(model$learner.model)

works.作品。

To be honest it seems to be weird code in the pmml package.老实说, pmml package 中的代码似乎很奇怪。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM