使用joblib重用sklearn中由cross_val_score拟合的模型

Question

I created the following function in python: 我在python中创建了以下函数：

def cross_validate(algorithms, data, labels, cv=4, n_jobs=-1):
    print "Cross validation using: "
    for alg, predictors in algorithms:
        print alg
        print
        # Compute the accuracy score for all the cross validation folds. 
        scores = cross_val_score(alg, data, labels, cv=cv, n_jobs=n_jobs)
        # Take the mean of the scores (because we have one for each fold)
        print scores
        print("Cross validation mean score = " + str(scores.mean()))

        name = re.split('\(', str(alg))
        filename = str('%0.5f' %scores.mean()) + "_" + name[0] + ".pkl"
        # We might use this another time 
        joblib.dump(alg, filename, compress=1, cache_size=1e9)  
        filenameL.append(filename)
        try:
            move(filename, "pkl")
        except:
            os.remove(filename) 

        print 
    return

I thought that in order to do cross validation, sklearn had to fit your function. 我认为，为了进行交叉验证，sklearn必须适合您的功能。

However, when I try to use it later (f is the pkl file I saved above in joblib.dump(alg, filename, compress=1, cache_size=1e9)) : 但是，当我稍后尝试使用它时（f是我在joblib.dump(alg, filename, compress=1, cache_size=1e9))保存的joblib.dump(alg, filename, compress=1, cache_size=1e9)) ：

alg = joblib.load(f)  
predictions = alg.predict_proba(train_data[predictors]).astype(float)

I get no error in the first line (so it looks like the load is working), but then it tells me NotFittedError: Estimator not fitted, call fit before exploiting the model. 我在第一行没有错误（因此看起来负载正在工作），但它告诉我NotFittedError: Estimator not fitted, call before exploiting the model. NotFittedError: Estimator not fitted, call fit before exploiting the model. on the following line. 在以下行。

What am I doing wrong? 我究竟做错了什么？ Can't I reuse the model fitted to calculate the cross-validation? 我不能重复使用适合的模型来计算交叉验证吗？ I looked at Keep the fitted parameters when using a cross_val_score in scikits learn but either I don't understand the answer, or it is not what I am looking for. 我在scikits中使用cross_val_score时查看了保持拟合参数但是我不理解答案，或者它不是我想要的。 What I want is to save the whole model with joblib so that I can the use it later without re-fitting. 我想要的是用joblib保存整个模型，以便我以后可以使用它而无需重新拟合。

Answer 1

It's not quite correct that cross-validation has to fit your model; 交叉验证必须适合您的模型，这是不正确的; rather a k-fold cross validation fits your model k times on partial data sets. 相反，k折交叉验证可以在部分数据集上适合您的模型k次。 If you want the model itself, you actually need to fit the model again on the whole dataset; 如果您想要模型本身，您实际上需要在整个数据集上再次拟合模型; this actually isn't part of the cross-validation process. 这实际上不是交叉验证过程的一部分。 So it actually wouldn't be redundant to call 所以实际上调用它并不是多余的

alg.fit(data, labels)

to fit your model after your cross validation. 在交叉验证后适合您的模型。

Another approcach would be rather than using the specialized function cross_val_score , you could think of this as a special case of a cross-validated grid search (with a single point in the parameter space). 另一个方法是使用专用函数cross_val_score ，而不是使用专用函数cross_val_score ，您可以将其视为交叉验证网格搜索的特殊情况（在参数空间中有一个点）。 In this case GridSearchCV will by default refit the model over the entire dataset (it has a parameter refit=True ), and also has predict and predict_proba methods in its API. 在这种情况下， GridSearchCV默认会在整个数据集上重新设置模型（它有一个参数refit=True ），并且在其API中也有predict和predict_proba方法。

Answer 2

The real reason your model is not fitted is that the function cross_val_score first copies your model before fitting the copy : Source link 您的模型不适合的真正原因是函数cross_val_score在拟合副本之前首先复制模型：源链接

So your original model has not been fitted. 因此您的原始模型尚未安装。

Answer 3

Cross_val_score does not keep the fitted model Cross_val_predict does There is no cross_val_predict_proba but you can do this Cross_val_score不保持拟合模型Cross_val_predict没有cross_val_predict_proba但你可以这样做

predict_proba for a cross-validated model predict_proba用于交叉验证的模型

使用joblib重用sklearn中由cross_val_score拟合的模型

问题描述

3 个解决方案

解决方案1
9 2016-07-24 23:03:04

解决方案2
4 已采纳 2018-04-11 09:38:07

解决方案3
-1 2016-07-24 22:50:27

使用joblib重用sklearn中由cross_val_score拟合的模型

问题描述

3 个解决方案

解决方案1 9 2016-07-24 23:03:04

解决方案2 4 已采纳 2018-04-11 09:38:07

解决方案3 -1 2016-07-24 22:50:27

解决方案1
9 2016-07-24 23:03:04

解决方案2
4 已采纳 2018-04-11 09:38:07

解决方案3
-1 2016-07-24 22:50:27