[英]Unknown label type: 'continuous'
My fellow Team, Having an issue 我的团队成员,遇到问题
---------------------- ----------------------
Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent
0 34.497268 12.655651 39.577668 4.082621 587.951054
1 31.926272 11.109461 37.268959 2.664034 392.204933
2 33.000915 11.330278 37.110597 4.104543 487.547505
3 34.305557 13.717514 36.721283 3.120179 581.852344
4 33.330673 12.795189 37.536653 4.446308 599.406092
5 33.871038 12.026925 34.476878 5.493507 637.102448
6 32.021596 11.366348 36.683776 4.685017 521.572175
Want to apply KNN 想要申请KNN
X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]
y = df['Yearly Amount Spent']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=42)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)
ValueError: Unknown label type: 'continuous' ValueError:未知标签类型:“连续”
The values in Yearly Amount Spent
column are real numbers, so they cannot serve as labels for a classification problem (see here ): “
Yearly Amount Spent
列中的值是实数,因此它们不能用作分类问题的标签(请参见此处 ):
When doing classification in scikit-learn, y is a vector of integers or strings.
在scikit-learn中进行分类时,y是整数或字符串的向量。
Hence you get the error. 因此,您会得到错误。 If you want to build a classification model, you need to decide how you transform them into a finite set of labels.
如果要构建分类模型,则需要决定如何将它们转换为有限的一组标签。
Note that if you just want to avoid the error, you could do 请注意,如果您只是想避免该错误,则可以执行
import numpy as np
y = np.asarray(df['Yearly Amount Spent'], dtype="|S6")
This will transform the values in y
into strings of the required format. 这会将
y
的值转换为所需格式的字符串。 Yet, every label will appear in only one sample, so you cannot really build a meaningful model with such set of labels. 但是,每个标签只会出现在一个样本中,因此您无法真正使用这样的标签集构建有意义的模型。
I think you are actually trying to do a regression rather than a classification, since your code pretty much looks like you want to predict the yearly amount spent as a number. 我认为您实际上是在尝试进行回归而不是分类,因为您的代码看起来很像您想预测以数字表示的年度花费。 In this case, use
在这种情况下,请使用
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)
instead. 代替。 If you really have a classification task, for example you want to classify into classes like ('yearly amount spent is low', 'yearly amount spent is high',...), you should discretize the labels and convert them into strings or integer numbers (as explained by @Miriam Farber), according to the thresholds you need to set manually in this case.
例如,如果您确实有一个分类任务,则想将其分类为类似的类(“每年花费少”,“每年花费高” ...),则应离散化标签并将其转换为字符串或整数(如@Miriam Farber所述),具体取决于您在这种情况下需要手动设置的阈值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.