[英]LogisticRegression: Unknown label type: 'continuous' using sklearn in python
I have the following code to test some of most popular ML algorithms of sklearn python library:我有以下代码来测试 sklearn python 库的一些最流行的机器学习算法:
import numpy as np
from sklearn import metrics, svm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))
clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))
clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))
clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))
clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))
clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))
clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))
clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))
The first two works ok, but I got the following error in LogisticRegression
call:前两个工作正常,但在
LogisticRegression
调用中出现以下错误:
root@ubupc1:/home/ouhma# python stack.py
LinearRegression
[ 15.72023529 6.46666667]
SVR
[ 3.95570063 4.23426243]
Traceback (most recent call last):
File "stack.py", line 28, in <module>
clf.fit(trainingData, trainingScores)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
The input data is the same as in the previous calls, so what is going on here?输入数据与之前调用中的相同,那么这里发生了什么?
And by the way, why there is a huge diference in the first prediction of LinearRegression()
and SVR()
algorithms (15.72 vs 3.95)
?顺便说一下,为什么
LinearRegression()
和SVR()
算法的第一次预测会有巨大差异(15.72 vs 3.95)
?
You are passing floats to a classifier which expects categorical values as the target vector.您将浮点数传递给分类器,该分类器将分类值作为目标向量。 If you convert it to
int
it will be accepted as input (although it will be questionable if that's the right way to do it).如果您将其转换为
int
,它将被接受为输入(尽管如果这是正确的做法,那将是有问题的)。
It would be better to convert your training scores by using scikit's labelEncoder
function.最好使用 scikit 的
labelEncoder
函数转换您的训练分数。
The same is true for your DecisionTree and KNeighbors qualifier.您的 DecisionTree 和 KNeighbors 限定符也是如此。
from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)
print(utils.multiclass.type_of_target(trainingScores))
>>> continuous
print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass
print(utils.multiclass.type_of_target(encoded))
>>> multiclass
I struggled with the same issue when trying to feed floats to the classifiers.在尝试将浮点数提供给分类器时,我遇到了同样的问题。 I wanted to keep floats and not integers for accuracy.
为了准确起见,我想保留浮点数而不是整数。 Try using regressor algorithms.
尝试使用回归算法。 For example:
例如:
import numpy as np
from sklearn import linear_model
from sklearn import svm
classifiers = [
svm.SVR(),
linear_model.SGDRegressor(),
linear_model.BayesianRidge(),
linear_model.LassoLars(),
linear_model.ARDRegression(),
linear_model.PassiveAggressiveRegressor(),
linear_model.TheilSenRegressor(),
linear_model.LinearRegression()]
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
for item in classifiers:
print(item)
clf = item
clf.fit(trainingData, trainingScores)
print(clf.predict(predictionData),'\n')
LogisticRegression
is not for regression but classification ! LogisticRegression
不是用于回归而是用于分类!
The Y
variable must be the classification class, Y
变量必须是分类类别,
(for example 0
or 1
) (例如
0
或1
)
And not a continuous
variable,而不是
continuous
变量,
that would be a regression problem.那将是一个回归问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.