简体   繁体   English

在 sklearn 管道中转换估计器的结果

[英]Transform results of estimator in a sklearn pipeline

I have a sklearn pipeline that consists of a custom transformer, followed by XGBClassifier.我有一个 sklearn 管道,其中包含一个自定义转换器,然后是 XGBClassifier。 What I would like to add as a final step in the transformer is another custom transformer that transforms the results of the XGBClassifier.我想在转换器中添加的最后一步是另一个自定义转换器,它转换 XGBClassifier 的结果。

This last custom transformer will rank the predicted probabilities into ranks (5-percentiles).最后一个自定义转换器将预测概率排名(5 个百分位数)。

Pipeline([
          ('custom_trsf1', custom_trsf1),
          ('clf', XGBCLassifier()),
          ('custom_trsf2', custom_trsf2)])

The problem is that the sklearn pipeline requires that all steps (but the last) should have a fit and transform method.问题是 sklearn 管道要求所有步骤(但最后一步)都应该有一个 fit and transform 方法。 Can I solve this in another way instead of extending the XGBclassifier and adding a transform method to it?我可以用另一种方式解决这个问题,而不是扩展 XGBclassifier 并向其添加转换方法吗?

From seeing the source code of Pipeline implementation, the estimator used to fit the data goes on the last position of your steps, the _final_estimator property of Pipeline calls the last position of Pipeline's steps.Pipeline实现的源代码来看,用于拟合数据的估计器在您的步骤的最后一个 position 上,Pipeline 的_final_estimator属性调用了 Pipeline 步骤的最后一个 position。

@property
def _final_estimator(self):
    estimator = self.steps[-1][1]
    return 'passthrough' if estimator is None else estimator

where steps might be something like steps可能类似于

steps = [('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)),
 ('svc',
  SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
      decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
      max_iter=-1, probability=False, random_state=None, shrinking=True,
      tol=0.001, verbose=False))]

The _final_estimator property is just called, after fitting all the transforms one after the other, to get the estimator to be fitted to the model, see line 333 for details. _final_estimator属性只是在一个接一个地拟合所有变换之后调用,以使估计器适合 model,有关详细信息,请参见第333行。

So, considering steps , I can retrieve an SVC class from it's last position所以,考虑到steps ,我可以从它的最后一个 position 中检索一个SVC class

final_estimator = steps[-1][1]
final_estimator
>>> SVC(C=1.0, ..., verbose=False)

and fit it the training data并将其拟合到训练数据中

final_estimator.fit(Xt, y)

where Xt is the transformed training data ( calculated before fitting the estimator) and y the training target.其中Xt是转换后的训练数据(在拟合估计器之前计算), y是训练目标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM