[英]Reusing model fitted by cross_val_score in sklearn using joblib
I created the following function in python: 我在python中创建了以下函数:
def cross_validate(algorithms, data, labels, cv=4, n_jobs=-1):
print "Cross validation using: "
for alg, predictors in algorithms:
print alg
print
# Compute the accuracy score for all the cross validation folds.
scores = cross_val_score(alg, data, labels, cv=cv, n_jobs=n_jobs)
# Take the mean of the scores (because we have one for each fold)
print scores
print("Cross validation mean score = " + str(scores.mean()))
name = re.split('\(', str(alg))
filename = str('%0.5f' %scores.mean()) + "_" + name[0] + ".pkl"
# We might use this another time
joblib.dump(alg, filename, compress=1, cache_size=1e9)
filenameL.append(filename)
try:
move(filename, "pkl")
except:
os.remove(filename)
print
return
I thought that in order to do cross validation, sklearn had to fit your function. 我认为,为了进行交叉验证,sklearn必须适合您的功能。
However, when I try to use it later (f is the pkl file I saved above in joblib.dump(alg, filename, compress=1, cache_size=1e9))
: 但是,当我稍后尝试使用它时(f是我在
joblib.dump(alg, filename, compress=1, cache_size=1e9))
保存的joblib.dump(alg, filename, compress=1, cache_size=1e9))
:
alg = joblib.load(f)
predictions = alg.predict_proba(train_data[predictors]).astype(float)
I get no error in the first line (so it looks like the load is working), but then it tells me NotFittedError: Estimator not fitted, call
fit before exploiting the model.
我在第一行没有错误(因此看起来负载正在工作),但它告诉我
NotFittedError: Estimator not fitted, call
before exploiting the model.
NotFittedError: Estimator not fitted, call
fit before exploiting the model.
on the following line. 在以下行。
What am I doing wrong? 我究竟做错了什么? Can't I reuse the model fitted to calculate the cross-validation?
我不能重复使用适合的模型来计算交叉验证吗? I looked at Keep the fitted parameters when using a cross_val_score in scikits learn but either I don't understand the answer, or it is not what I am looking for.
我在scikits中使用cross_val_score时查看了保持拟合参数但是我不理解答案,或者它不是我想要的。 What I want is to save the whole model with joblib so that I can the use it later without re-fitting.
我想要的是用joblib保存整个模型,以便我以后可以使用它而无需重新拟合。
It's not quite correct that cross-validation has to fit your model; 交叉验证必须适合您的模型,这是不正确的; rather a k-fold cross validation fits your model k times on partial data sets.
相反,k折交叉验证可以在部分数据集上适合您的模型k次。 If you want the model itself, you actually need to fit the model again on the whole dataset;
如果您想要模型本身,您实际上需要在整个数据集上再次拟合模型; this actually isn't part of the cross-validation process.
这实际上不是交叉验证过程的一部分。 So it actually wouldn't be redundant to call
所以实际上调用它并不是多余的
alg.fit(data, labels)
to fit your model after your cross validation. 在交叉验证后适合您的模型。
Another approcach would be rather than using the specialized function cross_val_score
, you could think of this as a special case of a cross-validated grid search (with a single point in the parameter space). 另一个方法是使用专用函数
cross_val_score
,而不是使用专用函数cross_val_score
,您可以将其视为交叉验证网格搜索的特殊情况(在参数空间中有一个点)。 In this case GridSearchCV
will by default refit the model over the entire dataset (it has a parameter refit=True
), and also has predict
and predict_proba
methods in its API. 在这种情况下,
GridSearchCV
默认会在整个数据集上重新设置模型(它有一个参数refit=True
),并且在其API中也有predict
和predict_proba
方法。
The real reason your model is not fitted is that the function cross_val_score
first copies your model before fitting the copy : Source link 您的模型不适合的真正原因是函数
cross_val_score
在拟合副本之前首先复制模型: 源链接
So your original model has not been fitted. 因此您的原始模型尚未安装。
Cross_val_score does not keep the fitted model Cross_val_predict does There is no cross_val_predict_proba but you can do this Cross_val_score不保持拟合模型Cross_val_predict没有cross_val_predict_proba但你可以这样做
predict_proba for a cross-validated model predict_proba用于交叉验证的模型
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.