简体   繁体   中英

scikit-learn RFECV array with 0 samples

I was trying to follow the tutorial as given here for using the Recursive feature elimination with cross-validation (RFECV) functionality of scikit-learn using my own data, and keep on getting a puzzling error:

ValueError: Found array with 0 sample(s) (shape=(0, 9)) while a minimum of 1 is required.

The code I'm using is as follows:

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV

data = pd.read_csv('data.csv', index_col = 0)

training = data.iloc[:50]
# training on the first 50 rows
training_y = np.asarray(training.C1, dtype="|S6")
training_x = training.drop('C1', axis=1)

print training_y.shape
print training_x.shape


# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
              scoring = 'accuracy')

rfecv.fit(training_x, training_y)

Just for reference, the outputs to the two print statements are:

(50,)

(50, 9)

Thanks!

I created dummy data and it worked for me:

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV

data = np.random.randn(50,9)

# training on the first 50 rows
training_y = np.random.random(50).round()
training_x = data

print(training_y.shape)
print(training_x.shape)


# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
              scoring = 'accuracy')

rfecv.fit(training_x, training_y)

The output is:

RFECV(cv=sklearn.cross_validation.StratifiedKFold(labels=[ 1.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  0.  1.  0.  1.
  1.  0.  1.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.  0.  1.  0.
  1.  1.  0.  1.  1.  0.  0.  0.  1.  0.  0.  0.  1.  0.], n_folds=3, shuffle=False, random_state=None),
   estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
   estimator_params=None, scoring='accuracy', step=1, verbose=0)

It would be good if you can provide us with your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM