[英]Data Preprocessing for KNN in python
预处理需要花费大量时间来理解元组、列表、浮点数、数组结构。 我有看起来像的数据
<bound method NDFrame.head of X Y
0 [1.9902, 1.9902, 1.9902, 1.9902, 1.9902, 0.034... [0.097, 0.097, 0.097, 0.094]
1 [1.9902, 0.034, 0.034, 0.034, 0.034, 0.034, 0.... [0.094, 0.094, 0.094, 0.094]
2 [0.034, 0.034, 0.097, 0.097, 0.097, 0.097, 0.0... [1.0882, 1.0882, 1.0882, 1.0882]
3 [0.097, 0.097, 0.097, 0.094, 0.094, 0.094, 0.0... [1.0882, 1.2382, 1.2382, 1.2382]
4 [0.094, 0.094, 0.094, 0.094, 1.0882, 1.0882, 1... [1.2382, 1.2382, 1.2182, 1.2182]
... ... ...
3395 [0.136, 0.286, 0.286, 0.286, 0.286, 0.286, 0.2... [0.1276, 0.1276, 0.1276, 0.1276]
3396 [0.286, 0.286, 0.266, 0.266, 0.266, 0.266, 0.2... [1.1423, 1.2923, 1.2723, 3.672]
3397 [0.266, 0.266, 0.266, 0.1276, 0.1276, 0.1276, ... [3.672, 3.672, 3.772, 3.772]
3398 [0.1276, 0.1276, 0.1276, 0.1276, 1.1423, 1.292... [3.772, 3.802, 3.802, 3.802]
3399 [1.1423, 1.2923, 1.2723, 3.672, 3.672, 3.672, ... [1.021, 1.021, 1.021, 1.021]
我正在使用
x=csv_data['X']
y=csv_data['Y']
#print(x)
x_train, x_test, y_train, y_test = train_test_split(x,y)
拟合 KNN 模型
K = []
training = []
test = []
scores = {}
for k in range(2, 21):
clf = KNeighborsClassifier(n_neighbors = k)
clf.fit(x_train, y_train)
training_score = clf.score(x_train, y_train)
test_score = clf.score(x_test, y_test)
K.append(k)
training.append(training_score)
test.append(test_score)
scores[k] = [training_score, test_score]
获取错误
TypeError Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'list'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-93-906aa771beda> in <module>()
6 for k in range(2, 21):
7 clf = KNeighborsClassifier(n_neighbors = k)
----> 8 clf.fit(x_train, y_train)
9
10 training_score = clf.score(x_train, y_train)
7 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
ValueError: setting an array element with a sequence.
我一直在尝试一些方法,例如preprocessing
或StandardScaler
对我有用。 请帮助运行 KNN。 谢谢
问题是,在使用KNN
您的y
的形状为(n, 4)
而KNN.fit
方法希望您的y
的形状为(n,1)
。 所以简而言之,您只能从y
预测 1 个值。 所以简而言之,您要么对y
每一列使用KNN
4 次,要么不使用KNN
。
代码将是这样的
# Import KNN for regression
y1 = y.iloc[:, 0]
y2 = y.iloc[:, 1]
y3 = y.iloc[:, 2]
y4 = y.iloc[:, 3]
regressor1 = KNeighborsRegressor(n_neighbors=k).fit(x, y1)
regressor2 = KNeighborsRegressor(n_neighbors=k).fit(x, y2)
regressor3 = KNeighborsRegressor(n_neighbors=k).fit(x, y3)
regressor4 = KNeighborsRegressor(n_neighbors=k).fit(x, y4)
我的天啊!! 现在我看到您使用KNN
进行分类,而实际上您的问题是回归。 你的基础真的很差。
另外,只是不要使用它。 你不会从中得到任何好的结果,而且它的计算成本也很高。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.