简体   繁体   English

使用joblib重用sklearn中由cross_val_score拟合的模型

[英]Reusing model fitted by cross_val_score in sklearn using joblib

I created the following function in python: 我在python中创建了以下函数:

def cross_validate(algorithms, data, labels, cv=4, n_jobs=-1):
    print "Cross validation using: "
    for alg, predictors in algorithms:
        print alg
        print
        # Compute the accuracy score for all the cross validation folds. 
        scores = cross_val_score(alg, data, labels, cv=cv, n_jobs=n_jobs)
        # Take the mean of the scores (because we have one for each fold)
        print scores
        print("Cross validation mean score = " + str(scores.mean()))

        name = re.split('\(', str(alg))
        filename = str('%0.5f' %scores.mean()) + "_" + name[0] + ".pkl"
        # We might use this another time 
        joblib.dump(alg, filename, compress=1, cache_size=1e9)  
        filenameL.append(filename)
        try:
            move(filename, "pkl")
        except:
            os.remove(filename) 

        print 
    return

I thought that in order to do cross validation, sklearn had to fit your function. 我认为,为了进行交叉验证,sklearn必须适合您的功能。

However, when I try to use it later (f is the pkl file I saved above in joblib.dump(alg, filename, compress=1, cache_size=1e9)) : 但是,当我稍后尝试使用它时(f是我在joblib.dump(alg, filename, compress=1, cache_size=1e9))保存的joblib.dump(alg, filename, compress=1, cache_size=1e9))

alg = joblib.load(f)  
predictions = alg.predict_proba(train_data[predictors]).astype(float)

I get no error in the first line (so it looks like the load is working), but then it tells me NotFittedError: Estimator not fitted, call fit before exploiting the model. 我在第一行没有错误(因此看起来负载正在工作),但它告诉我NotFittedError: Estimator not fitted, call before exploiting the model. NotFittedError: Estimator not fitted, call fit before exploiting the model. on the following line. 在以下行。

What am I doing wrong? 我究竟做错了什么? Can't I reuse the model fitted to calculate the cross-validation? 我不能重复使用适合的模型来计算交叉验证吗? I looked at Keep the fitted parameters when using a cross_val_score in scikits learn but either I don't understand the answer, or it is not what I am looking for. 我在scikits中使用cross_val_score时查看了保持拟合参数但是我不理解答案,或者它不是我想要的。 What I want is to save the whole model with joblib so that I can the use it later without re-fitting. 我想要的是用joblib保存整个模型,以便我以后可以使用它而无需重新拟合。

It's not quite correct that cross-validation has to fit your model; 交叉验证必须适合您的模型,这是不正确的; rather a k-fold cross validation fits your model k times on partial data sets. 相反,k折交叉验证可以在部分数据集上适合您的模型k次。 If you want the model itself, you actually need to fit the model again on the whole dataset; 如果您想要模型本身,您实际上需要在整个数据集上再次拟合模型; this actually isn't part of the cross-validation process. 这实际上不是交叉验证过程的一部分。 So it actually wouldn't be redundant to call 所以实际上调用它并不是多余的

alg.fit(data, labels)

to fit your model after your cross validation. 在交叉验证后适合您的模型。

Another approcach would be rather than using the specialized function cross_val_score , you could think of this as a special case of a cross-validated grid search (with a single point in the parameter space). 另一个方法是使用专用函数cross_val_score ,而不是使用专用函数cross_val_score ,您可以将其视为交叉验证网格搜索的特殊情况(在参数空间中有一个点)。 In this case GridSearchCV will by default refit the model over the entire dataset (it has a parameter refit=True ), and also has predict and predict_proba methods in its API. 在这种情况下, GridSearchCV默认会在整个数据集上重新设置模型(它有一个参数refit=True ),并且在其API中也有predictpredict_proba方法。

The real reason your model is not fitted is that the function cross_val_score first copies your model before fitting the copy : Source link 您的模型不适合的真正原因是函数cross_val_score在拟合副本之前首先复制模型: 源链接

So your original model has not been fitted. 因此您的原始模型尚未安装。

Cross_val_score does not keep the fitted model Cross_val_predict does There is no cross_val_predict_proba but you can do this Cross_val_score不保持拟合模型Cross_val_predict没有cross_val_predict_proba但你可以这样做

predict_proba for a cross-validated model predict_proba用于交叉验证的模型

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 sklearn 中使用 cross_val_score 生成负预测值用于模型性能评估 - Generate negative predictive value using cross_val_score in sklearn for model performance evaluation 使用sklearn cross_val_score和kfolds来拟合并帮助预测模型 - Using sklearn cross_val_score and kfolds to fit and help predict model 如何使用cross_val_score()Sklearn? - How to use cross_val_score() Sklearn? Sklearn cross_val_score 给出的数字与 model.score 明显不同? - Sklearn cross_val_score gives significantly differnt number than model.score? 在scikits中使用cross_val_score时保持拟合的参数 - Keep the fitted parameters when using a cross_val_score in scikits learn 使用 cross_val_score 时是否使用拟合估计器? - Do you use a fitted estimator when using cross_val_score? 无法在 sklearn cross_val_score 上评估 f1-score - Cannot evaluate f1-score on sklearn cross_val_score 支持的目标类型是: ('binary', 'multiclass') error with "cross_val_score" function from sklearn.model_selection - Supported target types are: ('binary', 'multiclass') error with “cross_val_score” function from sklearn.model_selection 在 sklearn.cross_validation 中使用 train_test_split 和 cross_val_score 的区别 - Difference between using train_test_split and cross_val_score in sklearn.cross_validation 在 cross_val_score 中使用 TimeSeriesSplit - Using TimeSeriesSplit within cross_val_score
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM