scikit.learn cross_val_score 中的错误

Question

please refer to the notebook at the following address请参考以下地址的笔记本

this portion of code,这部分代码，

scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
print scores
print scores.mean()

generates the following error in a window 7 64bit machine在window 7 64位机器中产生以下错误

---------------------------------------------------------------------------
 IndexError                                Traceback (most recent call last)
 <ipython-input-37-4a10affe67c7> in <module>()
 1 # evaluate the model using 10-fold cross-validation
 ----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10)
  3 print scores
  4 print scores.mean()

 C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in    cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
  1140                         allow_nans=True, allow_nd=True)
  1141 
  -> 1142     cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
  1143     scorer = check_scoring(estimator, score_func=score_func, scoring=scoring)
  1144     # We clone the estimator to make sure that all the folds are

  C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask)
  1366         if classifier:
  1367             if type_of_target(y) in ['binary', 'multiclass']:
  -> 1368                 cv = StratifiedKFold(y, cv, indices=needs_indices)
  1369             else:
  1370                 cv = KFold(_num_samples(y), cv, indices=needs_indices)

  C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state)
  428         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
  429             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 430                 label_test_folds = test_folds[y == label]
 431                 # the test split can be too big because we used
 432                 # KFold(max(c, self.n_folds), self.n_folds) instead of

IndexError: too many indices for array

I am using scikit.learn 0.15.2, it is suggested here that may a specific problem for windows 7, 64 bit machine.我正在使用 scikit.learn 0.15.2，这里建议可能是 windows 7、64 位机器的特定问题。

==============update============== ==============更新==============

I found the following code actually works我发现以下代码实际上有效

 from sklearn.cross_validation import KFold
 cv = KFold(X.shape[0], 10, shuffle=True, random_state=33)
 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=cv)
 print scores

==============update 2============= ==============更新2============

it seems due to some package update, I can no longer reproduce such error on my machine.似乎由于某些软件包更新，我无法再在我的机器上重现此类错误。 If you are facing the same issue on a windows 7 64bit machine, please let me know.如果您在 Windows 7 64 位机器上遇到同样的问题，请告诉我。

Answer 1

I had the same error you got and was looking for answers when I found this question.当我发现这个问题时，我遇到了同样的错误并且正在寻找答案。

I used the same sklearn.cross_validation.cross_val_score (except different algorithm) and the same machine windows 7, 64 bit.我使用了相同的 sklearn.cross_validation.cross_val_score（除了不同的算法）和相同的机器 windows 7, 64 位。

I tried your solution from above and it "worked", but it gave me the following warning:我从上面尝试了您的解决方案并且它“有效”，但它给了我以下警告：

C:\\Users\\E245713\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:1531: DataConversionWarning: A column-vector y was passed when a 1d array was expected. C:\\Users\\E245713\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:1531：DataConversionWarning：当需要一维数组时传递了列向量 y。 Please change the shape of y to (n_samples, ), for example using ravel().请将 y 的形状更改为 (n_samples, )，例如使用 ravel()。 estimator.fit(X_train, y_train, **fit_params) estimator.fit(X_train, y_train, **fit_params)

After reading the warning, I figured that the problem has something to do with the shape of 'y' (my label column).阅读警告后，我认为问题与“y”（我的标签列）的形状有关。 The keyword to try from the warning is "ravel()".从警告中尝试的关键字是“ravel()”。 So, I tried the following:所以，我尝试了以下方法：

y_arr = pd.DataFrame.as_matrix(label)
print(y_arr)
print(y_arr.shape())

which gave me这给了我

  [[1]
   [0]
   [1]
   .., 
   [0]
   [0]
   [1]]

  (87939, 1)

When I added 'ravel()':当我添加 'ravel()' 时：

y_arr = pd.DataFrame.as_matrix(label).ravel()
print(y_arr)
print(y_arr.shape())

it gave me:它给了我：

[1 0 1 ..., 0 0 1]

(87939,)

The dimension of 'y_arr' has to be in the form of (87939,) not (87939,1). 'y_arr' 的维度必须是 (87939,) 而不是 (87939,1) 的形式。 After that my original cross_val_score worked without adding the Kfold code.在那之后，我原来的 cross_val_score 在没有添加 Kfold 代码的情况下工作。

Hope this helps.希望这可以帮助。

Answer 2

I know the answer is late.我知道答案迟了。
But this answer might help other people struggling with same error.但是这个答案可能会帮助其他人在同样的错误中挣扎。 I have same issue with python 3.6 Upon changing from 3.6 to 3.5 ,I am able to use the function.我对 python 3.6 有同样的问题从 3.6 更改为 3.5 后，我可以使用该功能。
Below is the sample which i ran:以下是我运行的示例：

accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)

First create conda env with 3.5 version.首先创建 3.5 版本的 conda env。

conda create -n py35 python=3.5  
source activate py35

Hope this should help to move ahead希望这有助于继续前进

Answer 3

导入这个模块，它应该可以工作：

from sklearn.model_selection import cross_val_score

scikit.learn cross_val_score 中的错误

问题描述

3 个解决方案

解决方案1
2 2016-07-20 21:28:49

解决方案2
1 2019-01-22 09:08:44

解决方案3
0 2020-09-03 04:37:15

scikit.learn cross_val_score 中的错误

问题描述

3 个解决方案

解决方案1 2 2016-07-20 21:28:49

解决方案2 1 2019-01-22 09:08:44

解决方案3 0 2020-09-03 04:37:15

解决方案1
2 2016-07-20 21:28:49

解决方案2
1 2019-01-22 09:08:44

解决方案3
0 2020-09-03 04:37:15