简体   繁体   中英

How to pass a keyword argument to the predict method in a sklearn pipeline

I am using a GaussianProcess inside a Pipeline . The predict method of the GaussianProcess accepts a keyword arguments to its predict method called batch_size which I need to use to prevent filling up my memory.

Is there any way to pass this argument to the GaussianProcess instance when calling predict through the configured pipeline?

Here is a minimal example adapted from the sklearn documentation to demonstrate what I want:

import numpy as np
from sklearn.gaussian_process import GaussianProcess
from matplotlib import pyplot as pl

np.random.seed(1)

def f(x):
    """The function to predict."""
    return x * np.sin(x)

X = np.atleast_2d([1., 3., 5., 6., 7., 8.]).T
y = f(X).ravel()

gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1e-1,
                     random_start=100)
gp.fit(X, y)

x = np.atleast_2d(np.linspace(0, 10, 1000)).T
y_pred = gp.predict(x, batch_size=10)

from sklearn import pipeline
steps = [('gp', gp)]
p = pipeline.Pipeline(steps)
# How to pass the batch_size here?
p.predict(x)

You can solve it by allowing keyword arguments **predict_params to be passed to the predict method of the Pipeline.

from sklearn.pipeline import Pipeline

class Pipeline(Pipeline):
    def predict(self, X, **predict_params):
        """Applies transforms to the data, and the predict method of the
        final estimator. Valid only if the final estimator implements
        predict."""
        Xt = X
        for name, transform in self.steps[:-1]:
            Xt = transform.transform(Xt)
        return self.steps[-1][-1].predict(Xt, **predict_params)

While it is possible to add fit-parameters to fit and fit_transform methods of the pipeline, this is not possible for predict . See this line and the ensuing ones in the code of version 0.15 .

You may be able to monkeypatch it using

from functools import partial
gp.predict = partial(gp.predict, batch_size=10)

or, if that doesn't work, then

pipeline.steps[-1][-1].predict = partial(pipeline.steps[-1][-1].predict, batch_size=10)

Was looking for this -- found the answer somewhere else, but wanted to share here as it's the first related question I found.

In the current version of sklearn , you can pass keyword args to the pipeline, which will then be passed on to the predictor (ie the last element in the pipeline):

from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.pipeline import Pipeline

class Predictor(BaseEstimator, ClassifierMixin):
    def fit(self, *_):
        return self

    def predict(self, X, additional_arg):
        return f'ok: {X}, {additional_arg}'

pipe = Pipeline([
    ('passthrough', 'passthrough'),  # anything here would *not* see the keyword arg
    ('p', Predictor())
])

print(Predictor().predict('one', 'two'))
print(pipe.predict('three', additional_arg='four'))  # must be passed as keyword argument

# DO NOT:
print(pipe.predict('three')) # would raise an exception: missing parameter
print(pipe.predict('three', 'four')) # would raise an exception: no positional args allowed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM