i am working with KNeighborsClassifier algorithm from scikit-learn library in Python. I followed basic instructions eg split my data and labels into training and test data, then trained my model on a training data. Now I am trying to predict accuracy of testing data but get an error. Here is my code:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
data_train, data_test, label_train, label_test = train_test_split(df, labels,
test_size=0.2,
random_state=7)
mod = KNeighborsClassifier(n_neighbors=4)
mod.fit(data_train, label_train)
predictions = mod.predict(data_test)
print accuracy_score(label_train, predictions)
The error I get:
ValueError: Found arrays with inconsistent numbers of samples: [140 558]
140 is the portion of training data and 558 is the test data based on the test_size=0.2 (my data set is 698 samples). I verified that labels and data sets are of the same size 698. However, I get this error which is basically trying to compare test data and training data sets.
Does anyone knows what is wrong here? What should I use to train my model against to and what should I use to predict the score?
Thanks!
You should calculate the accuracy_score
with label_test
, not label_train
. You want to compare the actual labels of the test set, label_test
, to the predictions from your model, predictions
, for the test set.
Did you tried to solve your issue via the following question ?
sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.