简体   繁体   English

在 sklearn2pmml PMMLipeline 中自定义 function

[英]Custom function in sklearn2pmml PMMLPipeline

I am trying to create a machine learning model to suggest treatment for stroke patients based on their responses to various questionnaires and assessments.我正在尝试创建一个机器学习 model,以根据中风患者对各种问卷和评估的回答提出治疗建议。 For instance, the patient will be asked to rate the stiffness of the fingers, elbow, shoulder, and pectoral muscles (each on a scale of 0 to 100) or answer 14 questions related to mental health (each on a scale of 0 to 3).例如,患者将被要求对手指、肘部、肩部和胸肌的僵硬程度进行评分(每个评分为 0 到 100)或回答 14 个与心理健康相关的问题(每个评分为 0 到 3 )。

I would like to create an sklearn pipeline roughly as follows:我想大致如下创建一个sklearn管道:

1. The patient responses are aggregated. 1. 汇总患者反应。 For example, the four stiffness responses should be averaged to create a single “stiffness” value, while the fourteen mental health questions should be summed up to create a single “mental health” value.例如,四个僵硬反应应该被平均以创建一个单一的“僵硬”值,而十四个心理健康问题应该被总结为一个单一的“心理健康”值。 The “stiffness” and “mental health” values would then be features in the model. “刚度”和“心理健康”值将成为 model 中的特征。

2. Once the features have been aggregated in this way, a decision tree classifier is trained on labeled data to assign each patient to the appropriate therapy. 2. 一旦以这种方式聚合了特征,决策树分类器就会在标记数据上进行训练,以将每个患者分配给适当的治疗。

3. The trained pipeline is exported as a pmml file for production 3.将训练好的流水线导出为pmml文件进行生产

I assume this must be doable with some code like this:我认为这必须通过一些这样的代码来实现:

from sklearn2pmml.pipeline import PMMLPipeline

from sklearn2pmml import sklearn2pmml

from sklearn.tree import DecisionTreeClassifier

from somewhere import Something

pipeline = PMMLPipeline([
    ("input_aggregation", Something()),
    ("classifier", DecisionTreeClassifier())
])

pipeline.fit(patient_input, therapy_labels)
 
sklearn2pmml(pipeline, "ClassificationPipeline.pmml", with_repr = True)

I've been poking around the documentation and I can figure out to apply PCA to a group of columns but not how to do something as straightforward as collapsing a group of columns by summing or averaging.我一直在研究文档,我可以弄清楚将 PCA 应用于一组列,但不知道如何做一些简单的事情,比如通过求和或平均来折叠一组列。 Does anyone have any hints about how I could do this?有人对我如何做到这一点有任何提示吗?

Thanks for your help.谢谢你的帮助。

You just need to define a custom function and use it in the Pipeline .您只需要定义一个自定义 function 并在Pipeline中使用它。

Here is the full code:这是完整的代码:

from sklearn.preprocessing import FunctionTransformer
import numpy as np
from sklearn2pmml import make_pmml_pipeline

# fake data with 7 columns
X = np.random.rand(10,7)

n_rows = X.shape[0]

def custom_function(X):
    #averiging 4 first columns, sums the others, column-wise
    return np.concatenate([np.mean(X[:,0:5],axis = 1).reshape(n_rows,1), np.sum(X[:,5:],axis=1).reshape(n_rows,1)],axis = 1)

# Now, if you run: `custom_function(X)` it should return an array (10,2).

pipeline = make_pmml_pipeline(
FunctionTransformer(custom_function),
    )

Sample code:示例代码:

from sklearn_pandas import DataFrameMapper
from sklearn2pmml.preprocessing import Aggregator

pipeline = PMMLPipeline([
  ("mapper", DataFrameMapper([
    (["stiffness_1", "stiffness_2", "stiffness_3", "stiffness_4"], Aggregator(function = "mean")),
    (["mental_health_1", "mental_health2", .., "mental_health_14"], Aggregator(function = "sum"))
  ])),
  ("classifier", DecisionTreeClassifier())
])
pipeline.fit(X, y)

Explanation - you can use sklearn_pandas.DataFrameMapper to define a column group, and apply a transformation to it.说明 - 您可以使用sklearn_pandas.DataFrameMapper定义列组,并对它应用转换。 For the conversion to PMML work, you need to provide a transformer class, not a direct function.为了转换到 PMML 工作,您需要提供变压器 class,而不是直接 function。 Perhaps all your transformation needs are handled by the sklearn2pmml.preprocessing.Aggregator transformer class.也许您所有的转换需求都由sklearn2pmml.preprocessing.Aggregator转换器 class 处理。 If not, you can always define your own.如果没有,您可以随时定义自己的。

While @makis has provided a 100% valid Python example, it wouldn't work in the Python-to-PMML case, because the converter cannot parse/handle custom Python functions.虽然@makis 提供了一个 100% 有效的 Python 示例,但它不适用于 Python 到 PMML 的情况,因为转换器无法解析/处理自定义 Python 函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM