如何在sklearn中编写自定义估算器并对其进行交叉验证？

Question

I would like to check the prediction error of a new method trough cross-validation. 我想通过交叉验证来检查新方法的预测误差。 I would like to know if I can pass my method to the cross-validation function of sklearn and in case how. 我想知道我是否可以将我的方法传递给sklearn的交叉验证函数以及如何。

I would like something like sklearn.cross_validation(cv=10).mymethod . 我想要像sklearn.cross_validation(cv=10).mymethod 。

I need also to know how to define mymethod should it be a function and which input element and which output 我还需要知道如何定义mymethod它应该是一个函数以及哪个输入元素和哪个输出

For example we can consider as mymethod an implementation of the least square estimator (of course not the ones in sklearn) . 例如，我们可以将mymethod视为最小二乘估计的实现（当然不是sklearn中的那些）。

I found this tutorial link but it is not very clear to me. 我找到了这个教程链接，但对我来说不是很清楚。

In the documentation they use 在他们使用的文档中

>>> import numpy as np
>>> from sklearn import cross_validation
>>> from sklearn import datasets
>>> from sklearn import svm

>>> iris = datasets.load_iris()
>>> iris.data.shape, iris.target.shape
((150, 4), (150,))

 >>> clf = svm.SVC(kernel='linear', C=1) 
 >>> scores = cross_validation.cross_val_score(
 ...    clf, iris.data, iris.target, cv=5)
 ...
 >>> scores

But the problem is that they are using as estimator clf that is obtained by a function built in sklearn. 但问题是他们使用的是通过sklearn中构建的函数获得的估计器clf 。 How should I define my own estimator in order that I can pass it to the cross_validation.cross_val_score function? 我应该如何定义自己的估算器，以便将其传递给cross_validation.cross_val_score函数？

So for example suppose a simple estimator that use a linear model $y=x\\beta$ where beta is estimated as X[1,:]+alpha where alpha is a parameter. 因此，例如假设一个简单的估计器使用线性模型$ y = x \\ beta $，其中beta估计为X [1，：] + alpha，其中alpha是参数。 How should I complete the code? 我该如何完成代码？

class my_estimator():
      def fit(X,y):
          beta=X[1,:]+alpha #where can I pass alpha to the function?
          return beta
      def scorer(estimator, X, y) #what should the scorer function compute?
          return ?????

With the following code I received an error: 使用以下代码我收到一个错误：

class my_estimator():
    def fit(X, y, **kwargs):
        #alpha = kwargs['alpha']
        beta=X[1,:]#+alpha 
        return beta

>>> cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in cross_val_score
    for train, test in cv)
  File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py", line 516, in __call__
    for function, args, kwargs in iterable:
  File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in <genexpr>
    for train, test in cv)
  File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py", line 43, in clone
    % (repr(estimator), type(estimator)))
TypeError: Cannot clone object '<class __main__.my_estimator at 0x05ACACA8>' (type <type 'classobj'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods.
>>>

Answer 1

The answer also lies in sklearn's documentation . 答案还在于sklearn的文档。

You need to define two things: 您需要定义两件事：

an estimator that implements the fit(X, y) function, X being the matrix with inputs and y being the vector of outputs 实现fit(X, y)函数的估计器， X是带输入的矩阵， y是输出的向量
a scorer function, or callable object that can be used with: scorer(estimator, X, y) and returns the score of given model 得分手函数或可调用对象，可用于： scorer(estimator, X, y)并返回给定模型的得分

Referring to your example: first of all, scorer shouldn't be a method of the estimator, it's a different notion. 参考你的例子：首先， scorer不应该是估算器的方法，这是一个不同的概念。 Just create a callable: 只需创建一个可调用的：

def scorer(estimator, X, y)
    return ?????  # compute whatever you want, it's up to you to define
                  # what does it mean that the given estimator is "good" or "bad"

Or even a more simple solution: you can pass a string 'mean_squared_error' or 'accuracy' (full list available in this part of the documentation ) to cross_val_score function to use a predefined scorer. 或者甚至是一个更简单的解决方案：您可以将字符串'mean_squared_error'或'accuracy' （文档的这一部分中提供的完整列表）传递给cross_val_score函数以使用预定义的记分器。

Another possibility is to use make_scorer factory function. 另一种可能性是使用make_scorer工厂函数。

As for the second thing, you can pass parameters to your model through the fit_params dict parameter of the cross_val_score function (as mentioned in the documentation). 至于第二件事，您可以通过cross_val_score函数的fit_params dict参数将参数传递给模型（如文档中所述）。 These parameters will be passed to the fit function. 这些参数将传递给fit函数。

class my_estimator():
    def fit(X, y, **kwargs):
        alpha = kwargs['alpha']
        beta=X[1,:]+alpha 
        return beta

After reading all the error messages, which provide quite clear idea of what's missing, here is a simple example: 在阅读了所有错误消息后，这些消息提供了非常明确的缺失，这是一个简单的例子：

import numpy as np
from sklearn.cross_validation import cross_val_score

class RegularizedRegressor:
    def __init__(self, l = 0.01):
        self.l = l

    def combine(self, inputs):
        return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])

    def predict(self, X):
        return [self.combine(x) for x in X]

    def classify(self, inputs):
        return sign(self.predict(inputs))

    def fit(self, X, y, **kwargs):
        self.l = kwargs['l']
        X = np.matrix(X)
        y = np.matrix(y)
        W = (X.transpose() * X).getI() * X.transpose() * y

        self.weights = [w[0] for w in W.tolist()]

    def get_params(self, deep = False):
        return {'l':self.l}

X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])
y = np.matrix([0, 1, 1, 0]).transpose()

print cross_val_score(RegularizedRegressor(),
                      X,
                      y, 
                      fit_params={'l':0.1},
                      scoring = 'mean_squared_error')

如何在sklearn中编写自定义估算器并对其进行交叉验证？

问题描述

1 个解决方案

解决方案1
24 已采纳 2013-12-03 13:07:01

如何在sklearn中编写自定义估算器并对其进行交叉验证？

问题描述

1 个解决方案

解决方案1 24 已采纳 2013-12-03 13:07:01

解决方案1
24 已采纳 2013-12-03 13:07:01