简体   繁体   English

sklearn.LogisticRegression ValueError:未知 label 类型:'连续'

[英]sklearn.LogisticRegression ValueError: Unknown label type: 'continuous'

I get this error:我收到此错误:

ValueError: Unknown label type: 'continuous'

Here is my code:这是我的代码:

data=data.dropna()
array = data.values
X = array[:,0:]
y = array[:,-1]
X_train, X_validation, y_train, y_validation = train_test_split(X, Y, test_size=0.20, random_state=1)

models = []
models.append(('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
# Evaluate each model in turn
results = []
names = []
for name, model in models:
    # TimeSeries Cross validation
    tscv = TimeSeriesSplit(n_splits=10)
    cv_results = cross_val_score(model, X_train, y_train, cv=tscv, scoring='r2')
    results.append(cv_results)
    names.append(name)
    print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
    
# Compare Algorithms
plt.boxplot(results, labels=names)
plt.title('Algorithm Comparison')
plt.show()

I found another post with a similar problem but when I try and fix the problem with:我发现另一个有类似问题的帖子,但是当我尝试解决问题时:

from sklearn import utils

lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(y_train)

LogisticRegression and KNeighborsClassifier work, but LinearDiscriminantAnalysis returns nas and the error: LogisticRegression 和 KNeighborsClassifier 有效,但 LinearDiscriminantAnalysis 返回 nas 和错误:

ValueError: The number of samples must be more than the number of classes.

At that point I do not really understand what I am doing, and the documentation doesn't help me much.那时我并不真正了解我在做什么,并且文档对我没有多大帮助。

Could someone explain these errors to me?有人可以向我解释这些错误吗?

You have used classification models on continous output.您在连续 output 上使用了分类模型。

The output must be in the form of 0/1(binary) and 0/1/2/3(for multiclass) for the models you have used.对于您使用的模型,output 必须是 0/1(二进制)和 0/1/2/3(对于多类)的形式。

You can use linear/polynomial/ridge/lasso regression on time series data.您可以对时间序列数据使用线性/多项式/岭/套索回归。

Also you have used X = array[:,0:].您还使用了 X = array[:,0:]。 This will include the output column too along with the features(ie X).这也将包括 output 列以及功能(即 X)。

Use X = array[:,0:-1]instead.请改用 X = array[:,0:-1]。 This will exclude the last column when you take the features.当您采用特征时,这将排除最后一列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM