简体   繁体   English

scikit.learn cross_val_score 中的错误

[英]Error in scikit.learn cross_val_score

please refer to the notebook at the following address请参考以下地址的笔记本

LogisticRegression 逻辑回归

this portion of code,这部分代码,

scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
print scores
print scores.mean()

generates the following error in a window 7 64bit machine在window 7 64位机器中产生以下错误

---------------------------------------------------------------------------
 IndexError                                Traceback (most recent call last)
 <ipython-input-37-4a10affe67c7> in <module>()
 1 # evaluate the model using 10-fold cross-validation
 ----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
  3 print scores
  4 print scores.mean()

 C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in    cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
  1140                         allow_nans=True, allow_nd=True)
  1141 
  -> 1142     cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
  1143     scorer = check_scoring(estimator, score_func=score_func, scoring=scoring)
  1144     # We clone the estimator to make sure that all the folds are

  C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask)
  1366         if classifier:
  1367             if type_of_target(y) in ['binary', 'multiclass']:
  -> 1368                 cv = StratifiedKFold(y, cv, indices=needs_indices)
  1369             else:
  1370                 cv = KFold(_num_samples(y), cv, indices=needs_indices)

  C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state)
  428         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
  429             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 430                 label_test_folds = test_folds[y == label]
 431                 # the test split can be too big because we used
 432                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array 

I am using scikit.learn 0.15.2, it is suggested here that may a specific problem for windows 7, 64 bit machine.我正在使用 scikit.learn 0.15.2, 这里建议可能是 windows 7、64 位机器的特定问题。

==============update============== ==============更新==============

I found the following code actually works我发现以下代码实际上有效

 from sklearn.cross_validation import KFold
 cv = KFold(X.shape[0], 10, shuffle=True, random_state=33)
 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=cv)
 print scores

==============update 2============= ==============更新2============

it seems due to some package update, I can no longer reproduce such error on my machine.似乎由于某些软件包更新,我无法再在我的机器上重现此类错误。 If you are facing the same issue on a windows 7 64bit machine, please let me know.如果您在 Windows 7 64 位机器上遇到同样的问题,请告诉我。

I had the same error you got and was looking for answers when I found this question.当我发现这个问题时,我遇到了同样的错误并且正在寻找答案。

I used the same sklearn.cross_validation.cross_val_score (except different algorithm) and the same machine windows 7, 64 bit.我使用了相同的 sklearn.cross_validation.cross_val_score(除了不同的算法)和相同的机器 windows 7, 64 位。

I tried your solution from above and it "worked", but it gave me the following warning:我从上面尝试了您的解决方案并且它“有效”,但它给了我以下警告:

C:\\Users\\E245713\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:1531: DataConversionWarning: A column-vector y was passed when a 1d array was expected. C:\\Users\\E245713\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:1531:DataConversionWarning:当需要一维数组时传递了列向量 y。 Please change the shape of y to (n_samples, ), for example using ravel().请将 y 的形状更改为 (n_samples, ),例如使用 ravel()。 estimator.fit(X_train, y_train, **fit_params) estimator.fit(X_train, y_train, **fit_params)

After reading the warning, I figured that the problem has something to do with the shape of 'y' (my label column).阅读警告后,我认为问题与“y”(我的标签列)的形状有关。 The keyword to try from the warning is "ravel()".从警告中尝试的关键字是“ravel()”。 So, I tried the following:所以,我尝试了以下方法:

y_arr = pd.DataFrame.as_matrix(label)
print(y_arr)
print(y_arr.shape())

which gave me这给了我

  [[1]
   [0]
   [1]
   .., 
   [0]
   [0]
   [1]]

  (87939, 1)

When I added 'ravel()':当我添加 'ravel()' 时:

y_arr = pd.DataFrame.as_matrix(label).ravel()
print(y_arr)
print(y_arr.shape())

it gave me:它给了我:

[1 0 1 ..., 0 0 1]

(87939,)

The dimension of 'y_arr' has to be in the form of (87939,) not (87939,1). 'y_arr' 的维度必须是 (87939,) 而不是 (87939,1) 的形式。 After that my original cross_val_score worked without adding the Kfold code.在那之后,我原来的 cross_val_score 在没有添加 Kfold 代码的情况下工作。

Hope this helps.希望这可以帮助。

I know the answer is late.我知道答案迟了。
But this answer might help other people struggling with same error.但是这个答案可能会帮助其他人在同样的错误中挣扎。 I have same issue with python 3.6 Upon changing from 3.6 to 3.5 ,I am able to use the function.我对 python 3.6 有同样的问题 从 3.6 更改为 3.5 后,我可以使用该功能。
Below is the sample which i ran:以下是我运行的示例:

accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)

First create conda env with 3.5 version.首先创建 3.5 版本的 conda env。

conda create -n py35 python=3.5  
source activate py35  

Hope this should help to move ahead希望这有助于继续前进

导入这个模块,它应该可以工作:

from sklearn.model_selection import cross_val_score

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 解释 cross_val_score scikit_learn 参数 cv - Explication cross_val_score scikit_learn parameter cv “得分必须返回一个数字”scikit-learn中的cross_val_score错误 - “scoring must return a number” cross_val_score error in scikit-learn 使用 cross_val_predict 与 cross_val_score 时,scikit-learn 分数不同 - scikit-learn scores are different when using cross_val_predict vs cross_val_score 如何将 f1_score arguments 传递给 scikit 中的 make_scorer 学习与 cross_val_score 一起使用? - How to pass f1_score arguments to the make_scorer in scikit learn to use with cross_val_score? 交叉验证:来自scikit-learn参数的cross_val_score函数 - Cross validation: cross_val_score function from scikit-learn arguments 包装器自定义 class 用于 scikit-learn 的迭代输入器,与 cross_val_score() 一起使用 - Wrapper custom class for scikit-learn's Iterative Imputer for use with cross_val_score() Python Keras cross_val_score错误 - Python Keras cross_val_score Error Scikit:使用cross_val_score函数计算精度和召回率 - Scikit: calculate precision and recall using cross_val_score function 了解 kfold scitkit 中的 cross_val_score 学习 - Understanding cross_val_score in kfold scitkit learn Scikit-learn cross_val_score 抛出 ValueError:必须始终传递“Layer.call”的第一个参数 - Scikit-learn cross_val_score throws ValueError: The first argument to `Layer.call` must always be passed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM