如何在 sklearn 中使用自定义估计器进行交叉验证？

Question

I have written a custom estimator class with a fit and transform method.我编写了一个带有fit和transform方法的自定义估算器类。 I am able to create a model, train and predict using the model.我能够创建一个模型，使用该模型进行训练和预测。

However, while doing cross-validation, I run into this error: TypeError: cannot deepcopy this pattern object .但是，在进行交叉验证时，我遇到了这个错误： TypeError: cannot deepcopy this pattern object 。

This is how CustomEstimator looks like:这是CustomEstimator样子：

class DefaultEstimator(BaseEstimator, TransformerMixin):
    def __init__(self, preprocessor, pipelines):
      self.preprocessor = preprocessor
      self.pipelines = pipelines

    def fit(self, X, y=None):
      for each_pipeline in self.pipelines:
          each_pipeline.fit(self.preprocessor.apply(X), y)
      return self

   def transform(self, X):
     transformed_data = []
     for each_pipeline in self.pipelines:
        transformed_data.append(each_pipeline.transform(self.preprocessor.apply(X)))
     return sp.hstack(transformed_data)

Does anyone have an idea on approaching this issue?有没有人对解决这个问题有想法？

Answer 1

I would suggest having the preprocessor inside the pipeline itself.我建议在管道内部使用预处理器。 Cross_val_score would try to copy the params of the estimator, it would break when the estimator cannot return the params while calling get_params() . Cross_val_score会尝试复制估算器的参数，当估算器在调用get_params()时无法返回参数时，它会中断。

I am not sure whether your pipeline parameter is a Sklearn pipeline because the pipeline object is not iterable.我不确定您的管道参数是否是 Sklearn 管道，因为管道对象不可迭代。

Answer 2

As suggested in few comments, this error is because self.processor can't be deep-cloned.正如几条评论所建议的，这个错误是因为self.processor不能被深度克隆。

So, the workaround for this error is to remove preprocessing step from this class and move it as independent preprocessing step or inside the pipeline itself.因此，此错误的解决方法是从此类中删除预处理步骤，并将其作为独立的预处理步骤或在管道本身内部移动。

如何在 sklearn 中使用自定义估计器进行交叉验证？

问题描述

2 个解决方案

解决方案1
0 2019-01-18 09:09:29

解决方案2
0 已采纳 2019-01-22 15:07:18

如何在 sklearn 中使用自定义估计器进行交叉验证？

问题描述

2 个解决方案

解决方案1 0 2019-01-18 09:09:29

解决方案2 0 已采纳 2019-01-22 15:07:18

解决方案1
0 2019-01-18 09:09:29

解决方案2
0 已采纳 2019-01-22 15:07:18