I was trying to follow the tutorial as given here for using the Recursive feature elimination with cross-validation (RFECV) functionality of scikit-learn using my own data, and keep on getting a puzzling error:
ValueError: Found array with 0 sample(s) (shape=(0, 9)) while a minimum of 1 is required.
The code I'm using is as follows:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV
data = pd.read_csv('data.csv', index_col = 0)
training = data.iloc[:50]
# training on the first 50 rows
training_y = np.asarray(training.C1, dtype="|S6")
training_x = training.drop('C1', axis=1)
print training_y.shape
print training_x.shape
# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
scoring = 'accuracy')
rfecv.fit(training_x, training_y)
Just for reference, the outputs to the two print statements are:
(50,)
(50, 9)
Thanks!
I created dummy data and it worked for me:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV
data = np.random.randn(50,9)
# training on the first 50 rows
training_y = np.random.random(50).round()
training_x = data
print(training_y.shape)
print(training_x.shape)
# Create the RFE object and compute a cross-validated score.
svc = svm.SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator = svc, step = 1, cv = StratifiedKFold(training_y, 3),
scoring = 'accuracy')
rfecv.fit(training_x, training_y)
The output is:
RFECV(cv=sklearn.cross_validation.StratifiedKFold(labels=[ 1. 1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 0. 1.
1. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1. 0. 1. 0.
1. 1. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0.], n_folds=3, shuffle=False, random_state=None),
estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False),
estimator_params=None, scoring='accuracy', step=1, verbose=0)
It would be good if you can provide us with your data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.