[英]Data Preprocessing for KNN in python
preprocessing take a lot of time-consuming to understand, tuple, list, float, array structure.预处理需要花费大量时间来理解元组、列表、浮点数、数组结构。 I have data that looks like
我有看起来像的数据
<bound method NDFrame.head of X Y
0 [1.9902, 1.9902, 1.9902, 1.9902, 1.9902, 0.034... [0.097, 0.097, 0.097, 0.094]
1 [1.9902, 0.034, 0.034, 0.034, 0.034, 0.034, 0.... [0.094, 0.094, 0.094, 0.094]
2 [0.034, 0.034, 0.097, 0.097, 0.097, 0.097, 0.0... [1.0882, 1.0882, 1.0882, 1.0882]
3 [0.097, 0.097, 0.097, 0.094, 0.094, 0.094, 0.0... [1.0882, 1.2382, 1.2382, 1.2382]
4 [0.094, 0.094, 0.094, 0.094, 1.0882, 1.0882, 1... [1.2382, 1.2382, 1.2182, 1.2182]
... ... ...
3395 [0.136, 0.286, 0.286, 0.286, 0.286, 0.286, 0.2... [0.1276, 0.1276, 0.1276, 0.1276]
3396 [0.286, 0.286, 0.266, 0.266, 0.266, 0.266, 0.2... [1.1423, 1.2923, 1.2723, 3.672]
3397 [0.266, 0.266, 0.266, 0.1276, 0.1276, 0.1276, ... [3.672, 3.672, 3.772, 3.772]
3398 [0.1276, 0.1276, 0.1276, 0.1276, 1.1423, 1.292... [3.772, 3.802, 3.802, 3.802]
3399 [1.1423, 1.2923, 1.2723, 3.672, 3.672, 3.672, ... [1.021, 1.021, 1.021, 1.021]
I am doing data split using我正在使用
x=csv_data['X']
y=csv_data['Y']
#print(x)
x_train, x_test, y_train, y_test = train_test_split(x,y)
Fitting to KNN model拟合 KNN 模型
K = []
training = []
test = []
scores = {}
for k in range(2, 21):
clf = KNeighborsClassifier(n_neighbors = k)
clf.fit(x_train, y_train)
training_score = clf.score(x_train, y_train)
test_score = clf.score(x_test, y_test)
K.append(k)
training.append(training_score)
test.append(test_score)
scores[k] = [training_score, test_score]
Getting error获取错误
TypeError Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'list'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-93-906aa771beda> in <module>()
6 for k in range(2, 21):
7 clf = KNeighborsClassifier(n_neighbors = k)
----> 8 clf.fit(x_train, y_train)
9
10 training_score = clf.score(x_train, y_train)
7 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
ValueError: setting an array element with a sequence.
I have been trying few methods such as preprocessing
or StandardScaler
dint work for me.我一直在尝试一些方法,例如
preprocessing
或StandardScaler
对我有用。 Kindly help in running KNN.请帮助运行 KNN。 Thanks
谢谢
The problem is that while using KNN
your y
is of the shape (n, 4)
while the KNN.fit
method wants your y
to be of shape (n,1)
.问题是,在使用
KNN
您的y
的形状为(n, 4)
而KNN.fit
方法希望您的y
的形状为(n,1)
。 So in short you can only predict 1 value from y
.所以简而言之,您只能从
y
预测 1 个值。 So in short you either use KNN
4 times for each column in y
or don't use KNN
.所以简而言之,您要么对
y
每一列使用KNN
4 次,要么不使用KNN
。
The code will be like this代码将是这样的
# Import KNN for regression
y1 = y.iloc[:, 0]
y2 = y.iloc[:, 1]
y3 = y.iloc[:, 2]
y4 = y.iloc[:, 3]
regressor1 = KNeighborsRegressor(n_neighbors=k).fit(x, y1)
regressor2 = KNeighborsRegressor(n_neighbors=k).fit(x, y2)
regressor3 = KNeighborsRegressor(n_neighbors=k).fit(x, y3)
regressor4 = KNeighborsRegressor(n_neighbors=k).fit(x, y4)
OMG!!我的天啊!! Now that I see you were using
KNN
for classification where in fact your problem is regression.现在我看到您使用
KNN
进行分类,而实际上您的问题是回归。 Your fundamentals are really really poor.你的基础真的很差。
Also, Just don't even use that.另外,只是不要使用它。 You won't get any good results from it and it's also computationally expensive.
你不会从中得到任何好的结果,而且它的计算成本也很高。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.