简体   繁体   English

python中KNN的数据预处理

[英]Data Preprocessing for KNN in python

preprocessing take a lot of time-consuming to understand, tuple, list, float, array structure.预处理需要花费大量时间来理解元组、列表、浮点数、数组结构。 I have data that looks like我有看起来像的数据

<bound method NDFrame.head of                                                       X                                 Y
0     [1.9902, 1.9902, 1.9902, 1.9902, 1.9902, 0.034...      [0.097, 0.097, 0.097, 0.094]
1     [1.9902, 0.034, 0.034, 0.034, 0.034, 0.034, 0....      [0.094, 0.094, 0.094, 0.094]
2     [0.034, 0.034, 0.097, 0.097, 0.097, 0.097, 0.0...  [1.0882, 1.0882, 1.0882, 1.0882]
3     [0.097, 0.097, 0.097, 0.094, 0.094, 0.094, 0.0...  [1.0882, 1.2382, 1.2382, 1.2382]
4     [0.094, 0.094, 0.094, 0.094, 1.0882, 1.0882, 1...  [1.2382, 1.2382, 1.2182, 1.2182]
...                                                 ...                               ...
3395  [0.136, 0.286, 0.286, 0.286, 0.286, 0.286, 0.2...  [0.1276, 0.1276, 0.1276, 0.1276]
3396  [0.286, 0.286, 0.266, 0.266, 0.266, 0.266, 0.2...   [1.1423, 1.2923, 1.2723, 3.672]
3397  [0.266, 0.266, 0.266, 0.1276, 0.1276, 0.1276, ...      [3.672, 3.672, 3.772, 3.772]
3398  [0.1276, 0.1276, 0.1276, 0.1276, 1.1423, 1.292...      [3.772, 3.802, 3.802, 3.802]
3399  [1.1423, 1.2923, 1.2723, 3.672, 3.672, 3.672, ...      [1.021, 1.021, 1.021, 1.021]

I am doing data split using我正在使用

x=csv_data['X']
y=csv_data['Y']
#print(x)
x_train, x_test, y_train, y_test = train_test_split(x,y)

Fitting to KNN model拟合 KNN 模型

K = []
training = []
test = []
scores = {}
  
for k in range(2, 21):
    clf = KNeighborsClassifier(n_neighbors = k)
    clf.fit(x_train, y_train)
  
    training_score = clf.score(x_train, y_train)
    test_score = clf.score(x_test, y_test)
    K.append(k)
  
    training.append(training_score)
    test.append(test_score)
    scores[k] = [training_score, test_score]

Getting error获取错误

TypeError                                 Traceback (most recent call last)
TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-93-906aa771beda> in <module>()
      6 for k in range(2, 21):
      7     clf = KNeighborsClassifier(n_neighbors = k)
----> 8     clf.fit(x_train, y_train)
      9 
     10     training_score = clf.score(x_train, y_train)

7 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: setting an array element with a sequence.

I have been trying few methods such as preprocessing or StandardScaler dint work for me.我一直在尝试一些方法,例如preprocessingStandardScaler对我有用。 Kindly help in running KNN.请帮助运行 KNN。 Thanks谢谢

The problem is that while using KNN your y is of the shape (n, 4) while the KNN.fit method wants your y to be of shape (n,1) .问题是,在使用KNN您的y的形状为(n, 4)KNN.fit方法希望您的y的形状为(n,1) So in short you can only predict 1 value from y .所以简而言之,您只能从y预测 1 个值。 So in short you either use KNN 4 times for each column in y or don't use KNN .所以简而言之,您要么对y每一列使用KNN 4 次,要么不使用KNN

The code will be like this代码将是这样的

# Import KNN for regression

y1 = y.iloc[:, 0]
y2 = y.iloc[:, 1]
y3 = y.iloc[:, 2]
y4 = y.iloc[:, 3]

regressor1 = KNeighborsRegressor(n_neighbors=k).fit(x, y1)
regressor2 = KNeighborsRegressor(n_neighbors=k).fit(x, y2)
regressor3 = KNeighborsRegressor(n_neighbors=k).fit(x, y3)
regressor4 = KNeighborsRegressor(n_neighbors=k).fit(x, y4)

OMG!!我的天啊!! Now that I see you were using KNN for classification where in fact your problem is regression.现在我看到您使用KNN进行分类,而实际上您的问题是回归。 Your fundamentals are really really poor.你的基础真的很差。

Also, Just don't even use that.另外,只是不要使用它。 You won't get any good results from it and it's also computationally expensive.你不会从中得到任何好的结果,而且它的计算成本也很高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM