简体   繁体   English

未知标签类型:“连续”

[英]Unknown label type: 'continuous'

My fellow Team, Having an issue 我的团队成员,遇到问题
---------------------- ----------------------

   Avg.SessionLength TimeonApp  TimeonWebsite LengthofMembership Yearly Amount Spent
    0   34.497268   12.655651    39.577668     4.082621                 587.951054
    1   31.926272   11.109461    37.268959     2.664034                 392.204933
    2   33.000915   11.330278    37.110597     4.104543                 487.547505
    3   34.305557   13.717514    36.721283     3.120179                 581.852344
    4   33.330673   12.795189    37.536653     4.446308                 599.406092
    5   33.871038   12.026925    34.476878     5.493507                 637.102448
    6   32.021596   11.366348    36.683776     4.685017                 521.572175 

Want to apply KNN 想要申请KNN

X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']] 
y = df['Yearly Amount Spent'] 

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, 
random_state=42) 

from sklearn.neighbors import KNeighborsClassifier 
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)

ValueError: Unknown label type: 'continuous' ValueError:未知标签类型:“连续”

The values in Yearly Amount Spent column are real numbers, so they cannot serve as labels for a classification problem (see here ): Yearly Amount Spent列中的值是实数,因此它们不能用作分类问题的标签(请参见此处 ):

When doing classification in scikit-learn, y is a vector of integers or strings. 在scikit-learn中进行分类时,y是整数或字符串的向量。

Hence you get the error. 因此,您会得到错误。 If you want to build a classification model, you need to decide how you transform them into a finite set of labels. 如果要构建分类模型,则需要决定如何将它们转换为有限的一组标签。

Note that if you just want to avoid the error, you could do 请注意,如果您只是想避免该错误,则可以执行

import numpy as np
y = np.asarray(df['Yearly Amount Spent'], dtype="|S6")

This will transform the values in y into strings of the required format. 这会将y的值转换为所需格式的字符串。 Yet, every label will appear in only one sample, so you cannot really build a meaningful model with such set of labels. 但是,每个标签只会出现在一个样本中,因此您无法真正使用这样的标签集构建有意义的模型。

I think you are actually trying to do a regression rather than a classification, since your code pretty much looks like you want to predict the yearly amount spent as a number. 我认为您实际上是在尝试进行回归而不是分类,因为您的代码看起来很像您想预测以数字表示的年度花费。 In this case, use 在这种情况下,请使用

from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)

instead. 代替。 If you really have a classification task, for example you want to classify into classes like ('yearly amount spent is low', 'yearly amount spent is high',...), you should discretize the labels and convert them into strings or integer numbers (as explained by @Miriam Farber), according to the thresholds you need to set manually in this case. 例如,如果您确实有一个分类任务,则想将其分类为类似的类(“每年花费少”,“每年花费高” ...),则应离散化标签并将其转换为字符串或整数(如@Miriam Farber所述),具体取决于您在这种情况下需要手动设置的阈值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM