[英]How to write a custom estimator in sklearn and use cross-validation on it?
I would like to check the prediction error of a new method trough cross-validation. 我想通过交叉验证来检查新方法的预测误差。 I would like to know if I can pass my method to the cross-validation function of sklearn and in case how.
我想知道我是否可以将我的方法传递给sklearn的交叉验证函数以及如何。
I would like something like sklearn.cross_validation(cv=10).mymethod
. 我想要像
sklearn.cross_validation(cv=10).mymethod
。
I need also to know how to define mymethod
should it be a function and which input element and which output 我还需要知道如何定义
mymethod
它应该是一个函数以及哪个输入元素和哪个输出
For example we can consider as mymethod
an implementation of the least square estimator (of course not the ones in sklearn) . 例如,我们可以将
mymethod
视为最小二乘估计的实现(当然不是sklearn中的那些)。
I found this tutorial link but it is not very clear to me. 我找到了这个教程链接,但对我来说不是很清楚。
In the documentation they use 在他们使用的文档中
>>> import numpy as np
>>> from sklearn import cross_validation
>>> from sklearn import datasets
>>> from sklearn import svm
>>> iris = datasets.load_iris()
>>> iris.data.shape, iris.target.shape
((150, 4), (150,))
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_validation.cross_val_score(
... clf, iris.data, iris.target, cv=5)
...
>>> scores
But the problem is that they are using as estimator clf
that is obtained by a function built in sklearn. 但问题是他们使用的是通过sklearn中构建的函数获得的估计器
clf
。 How should I define my own estimator in order that I can pass it to the cross_validation.cross_val_score
function? 我应该如何定义自己的估算器,以便将其传递给
cross_validation.cross_val_score
函数?
So for example suppose a simple estimator that use a linear model $y=x\\beta$ where beta is estimated as X[1,:]+alpha where alpha is a parameter. 因此,例如假设一个简单的估计器使用线性模型$ y = x \\ beta $,其中beta估计为X [1,:] + alpha,其中alpha是参数。 How should I complete the code?
我该如何完成代码?
class my_estimator():
def fit(X,y):
beta=X[1,:]+alpha #where can I pass alpha to the function?
return beta
def scorer(estimator, X, y) #what should the scorer function compute?
return ?????
With the following code I received an error: 使用以下代码我收到一个错误:
class my_estimator():
def fit(X, y, **kwargs):
#alpha = kwargs['alpha']
beta=X[1,:]#+alpha
return beta
>>> cv=cross_validation.cross_val_score(my_estimator,x,y,scoring="mean_squared_error")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in cross_val_score
for train, test in cv)
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\externals\joblib\parallel.py", line 516, in __call__
for function, args, kwargs in iterable:
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\cross_validation.py", line 1152, in <genexpr>
for train, test in cv)
File "C:\Python27\lib\site-packages\scikit_learn-0.14.1-py2.7-win32.egg\sklearn\base.py", line 43, in clone
% (repr(estimator), type(estimator)))
TypeError: Cannot clone object '<class __main__.my_estimator at 0x05ACACA8>' (type <type 'classobj'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods.
>>>
The answer also lies in sklearn's documentation . 答案还在于sklearn的文档 。
You need to define two things: 您需要定义两件事:
an estimator that implements the fit(X, y)
function, X
being the matrix with inputs and y
being the vector of outputs 实现
fit(X, y)
函数的估计器, X
是带输入的矩阵, y
是输出的向量
a scorer function, or callable object that can be used with: scorer(estimator, X, y)
and returns the score of given model 得分手函数或可调用对象,可用于:
scorer(estimator, X, y)
并返回给定模型的得分
Referring to your example: first of all, scorer
shouldn't be a method of the estimator, it's a different notion. 参考你的例子:首先,
scorer
不应该是估算器的方法,这是一个不同的概念。 Just create a callable: 只需创建一个可调用的:
def scorer(estimator, X, y)
return ????? # compute whatever you want, it's up to you to define
# what does it mean that the given estimator is "good" or "bad"
Or even a more simple solution: you can pass a string 'mean_squared_error'
or 'accuracy'
(full list available in this part of the documentation ) to cross_val_score
function to use a predefined scorer. 或者甚至是一个更简单的解决方案:您可以将字符串
'mean_squared_error'
或'accuracy'
( 文档的这一部分中提供的完整列表)传递给cross_val_score
函数以使用预定义的记分器。
Another possibility is to use make_scorer
factory function. 另一种可能性是使用
make_scorer
工厂函数。
As for the second thing, you can pass parameters to your model through the fit_params
dict
parameter of the cross_val_score
function (as mentioned in the documentation). 至于第二件事,您可以通过
cross_val_score
函数的fit_params
dict
参数将参数传递给模型(如文档中所述)。 These parameters will be passed to the fit
function. 这些参数将传递给
fit
函数。
class my_estimator():
def fit(X, y, **kwargs):
alpha = kwargs['alpha']
beta=X[1,:]+alpha
return beta
After reading all the error messages, which provide quite clear idea of what's missing, here is a simple example: 在阅读了所有错误消息后,这些消息提供了非常明确的缺失,这是一个简单的例子:
import numpy as np
from sklearn.cross_validation import cross_val_score
class RegularizedRegressor:
def __init__(self, l = 0.01):
self.l = l
def combine(self, inputs):
return sum([i*w for (i,w) in zip([1] + inputs, self.weights)])
def predict(self, X):
return [self.combine(x) for x in X]
def classify(self, inputs):
return sign(self.predict(inputs))
def fit(self, X, y, **kwargs):
self.l = kwargs['l']
X = np.matrix(X)
y = np.matrix(y)
W = (X.transpose() * X).getI() * X.transpose() * y
self.weights = [w[0] for w in W.tolist()]
def get_params(self, deep = False):
return {'l':self.l}
X = np.matrix([[0, 0], [1, 0], [0, 1], [1, 1]])
y = np.matrix([0, 1, 1, 0]).transpose()
print cross_val_score(RegularizedRegressor(),
X,
y,
fit_params={'l':0.1},
scoring = 'mean_squared_error')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.