简体   繁体   中英

Python ValueError: Unknown label type: 'continuous'

I'm a beginner here and I am trying for the life of me to understand this other stack over flow post that has the same question as I do.. Logistic Regression:Unknown label type: 'continuous'

This is my machine learning code below, and the shell output is giving me ValueError: Unknown label type: 'continuous'

I think I understand that I am "passing floats to a classifier which expects categorical values as the target vector. If you convert it to int it will be accepted as input (although it will be questionable if that's the right way to do it). It would be better to convert your training scores by using scikit's labelEncoder function"

Can someone give me a tip on how to incorporate scikit's labelEncoder function into my code? Is this implemented prior to stating the classifiers X & y? Whatever I am trying I am doing something wrong. Thank you

import numpy as np
from sklearn import preprocessing, cross_validation, neighbors, utils
import pandas as pd

df = pd.read_csv('C:\\Users\\bbartling\\Documents\\Python\\WB             
Data\\WB_RTU6data.csv', index_col='Date', parse_dates=True)

print(df.head())
print(df.tail())
print(df.shape)
print(df.columns)
print(df.info())
print(df.describe())


X = np.array(df.drop(['VAV6znt'],1))
df.dropna(inplace=True)

y = np.array(df['VAV6znt'])


accuracies = []

X_train, X_test, y_train, y_test =             
cross_validation.train_test_split(X,y,test_size=0.50)

clf = neighbors.KNeighborsClassifier(n_neighbors=50)
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)

print(accuracy)

在此处输入图片说明 在此处输入图片说明

Since your VAV6znt column is a float, which means you are trying to estimate a numerical value from the data. That makes it a regression problem and you are using KNeighborsClassifier which is a classification estimator.

Try using KNeighborsRegressor or any other estimators which have Regressor in their name.

Converting them to int as you did above will work but will not give good results because that means that you have those many classes in your data as their are unique ints in it, which obviously is wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM