在Python中使用GridSearch从sklearn模型生成PMML

Question

我正在寻找在sklearn中训练模型并将其导出以通过PMML执行到其他环境（使用https://github.com/jpmml/sklearn2pmml ）

我可以从普通（k最近邻居）模型（不使用GridSearch）生成PMML，但是使用GridSearch会收到以下错误-

TypeError：管道对象不是PMMLPipeline的实例

该错误是有道理的（因为GridSearchCV不返回PMMLPipeline），但正在寻找有关如何将优化的模型（带有GridSearch）导出到PMML的想法（例如，是否可以在PMMLPipeline中包括GridSearch）。

下面的代码-TIA的任何想法。

from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV

knn_pipe = PMMLPipeline([
("regressor", KNeighborsRegressor())
])

param_grid = {"regressor__n_neighbors": [3, 2,10],
          "regressor__weights": ["uniform","distance"],
          "regressor__algorithm": ["auto", "ball_tree", "kd_tree"]}

cv = GridSearchCV(knn_pipe, param_grid=param_grid)

print(train.drop('y',axis=1).shape)

cv.fit(X,Y)

best_parameters = cv.best_estimator_.get_params()
print("best parameter = {}".format(best_parameters))

from sklearn2pmml import sklearn2pmml
sklearn2pmml(cv, "kNNMercedes.pmml", with_repr = True)




['regressor__algorithm', 'regressor__n_neighbors', 'regressor__metric', 
'regressor__leaf_size', 'regressor', 'regressor__p', 
'regressor__metric_params', 'steps', 'regressor__n_jobs', 
'regressor__weights']
 (4209, 365)
best parameter = {'regressor__algorithm': 'auto', 'regressor__n_neighbors': 
10, 'regressor__metric': 'minkowski', 'regressor__leaf_size': 30, 'regressor': 
KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
          metric_params=None, n_jobs=1, n_neighbors=10, p=2,
      weights='distance'), 'regressor__p': 2, 'regressor__metric_params': None, 'steps': [('regressor', KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
      metric_params=None, n_jobs=1, n_neighbors=10, p=2,
      weights='distance'))], 'regressor__n_jobs': 1, 'regressor__weights': 
'distance'}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-d548c1bff799> in <module>()
 30 
 31 from sklearn2pmml import sklearn2pmml
---> 32 sklearn2pmml(cv, "kNNMercedes.pmml", with_repr = True)
 33 
 34 print("yeay PMML!")

/Users/venuv/.local/lib/python2.7/site-packages/sklearn2pmml/__init__.pyc in 
sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug)
125                 print("sklearn2pmml: ", __version__)
126         if(not isinstance(pipeline, PMMLPipeline)):
--> 127                 raise TypeError("The pipeline object is not an 
instance of " + PMMLPipeline.__name__)
128         cmd = ["java", "-cp", os.pathsep.join(_package_classpath() + user_classpath), "org.jpmml.sklearn.Main"]
129         dumps = []

TypeError: The pipeline object is not an instance of PMMLPipeline

Answer 1

解决方案是从适合的GridSearchCV实例构造一个PMMLPipeline实例：

pipeline = PMMLPipeline([
  ("best_estimator", cv.best_estimator_)
])
sklearn2pmml(pipeline, "pipeline.pmml")

@ vivek-kumar用户已通过JPMML-SkLearn项目报告了此问题，并在那里收到了一些其他评论。 参见jpmml / jpmml-sklearn＃42

在Python中使用GridSearch从sklearn模型生成PMML

问题描述

TypeError：管道对象不是PMMLPipeline的实例

1 个解决方案

解决方案1
0 2017-06-20 08:03:33

在Python中使用GridSearch从sklearn模型生成PMML

问题描述

TypeError：管道对象不是PMMLPipeline的实例

1 个解决方案

解决方案1 0 2017-06-20 08:03:33

解决方案1
0 2017-06-20 08:03:33