[英]How to pass a single feature of a data set to train using sklearn KNeighborsClassifier and predict value?
SO I read a csv dataset and then store it using pandas dataframe, I then split the data into training and testing set.所以我读取了一个 csv 数据集,然后使用 Pandas 数据帧存储它,然后我将数据拆分为训练和测试集。 What I am trying to accomplish is to train and predict accuracy using only one feature at a time so that I can later see what feature would be best predictor out of the 4. I'm new to python and machine learning, so please bare w me.我想要完成的是一次只使用一个特征来训练和预测准确性,以便我以后可以看到哪个特征是 4 个中最好的预测器。我是 python 和机器学习的新手,所以请裸露 w我。 This is actually the first time I actually try both.这实际上是我第一次真正尝试两者。 I get an error in this line my_knn_for_cs4661.fit(X_train[col], y_train)
something about array.reshape(-1,1)
I have tried to do X_train[col].reshape(-1,1)
but I get some other errors.我在这一行中得到一个错误my_knn_for_cs4661.fit(X_train[col], y_train)
一些关于my_knn_for_cs4661.fit(X_train[col], y_train)
array.reshape(-1,1)
我试图做X_train[col].reshape(-1,1)
但我得到了一些其他错误。 I am using python 3 on jupyter notebook, sklearn, numpy, and pandas.我在 jupyter notebook、sklearn、numpy 和 pandas 上使用 python 3。
Below is my code and error下面是我的代码和错误
from sklearn.model_selection import train_test_split
iris_df = pd.read_csv('https://raw.githubusercontent.com/mpourhoma/CS4661/master/iris.csv')
feature_cols = ['sepal_length','sepal_width','petal_length','petal_width']
X = iris_df[feature_cols]
y = iris_df['species']
predictions= {}
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=6)
k = 3
my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)
for col in feature_cols:
my_knn_for_cs4661.fit(X_train[col], y_train)
y_predict = my_knn_for_cs4661.predict(X_test)
predictions[col] = y_predict
My Error:我的错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-933eb8b496d8> in <module>()
13 for col in feature_cols:
14
---> 15 my_knn_for_cs4661.fit(X_train[col], y_train)
16 y_predict = my_knn_for_cs4661.predict(X_test)
17 predictions[col] = y_predict
~\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, X, y)
763 """
764 if not isinstance(X, (KDTree, BallTree)):
--> 765 X, y = check_X_y(X, y, "csr", multi_output=True)
766
767 if y.ndim == 1 or y.ndim == 2 and y.shape[1] == 1:
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
571 X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
572 ensure_2d, allow_nd, ensure_min_samples,
--> 573 ensure_min_features, warn_on_dtype, estimator)
574 if multi_output:
575 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
439 "Reshape your data either using array.reshape(-1, 1) if "
440 "your data has a single feature or array.reshape(1, -1) "
--> 441 "if it contains a single sample.".format(array))
442 array = np.atleast_2d(array)
443 # To ensure that array flags are maintained
ValueError: Expected 2D array, got 1D array instead:
array=[6. 5. 5.7 6.3 5.6 5.6 4.6 5.8 5.8 4.7 5.5 5.4 5.8 6.4 6.5 6.7 6.1 6.9
7.2 6.2 5.1 4.9 6.5 6.8 5.1 4.6 5.7 7.9 6.1 6.3 6.8 5.5 6.3 6.7 5.5 5.
7.3 4.4 5.3 4.8 4.5 4.6 5. 5.8 6.9 4.8 7.7 5.8 5.4 6.7 5.5 6.7 5.9 5.6
5. 6. 5.9 7. 5.4 4.9 5. 5.2 6. 5.1 6.1 6.2 5.6 6.7 6.8 5.8 6.7 5.7
7.2 5.4 7.4 4.4 6.2 6.5 5. 6.7 6.6 4.9 5. 6. 5.5 6.2 5.7 7.2 4.9 6. ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Expected 2D array, got 1D array instead
意味着当您实现KNeighborClassifier
,您的训练数据集必须至少包含两个特征,例如
X_train[['sepal_length', 'sepal_width']]
I found a solution although it seems hacky, IDK if this is the pythonic way.我找到了一个解决方案,尽管它看起来很笨拙,如果这是 pythonic 方式的话,IDK。
iris_df = pd.read_csv('https://raw.githubusercontent.com/mpourhoma/CS4661/master/iris.csv')
feature_cols = ['sepal_length','sepal_width','petal_length','petal_width']
X = iris_df[feature_cols]
y = iris_df['species']
predictions= {}
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=6)
k = 3
my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)
for col in feature_cols:
my_knn_for_cs4661.fit(X_train[col].values.reshape(-1,1), y_train)
y_predict = my_knn_for_cs4661.predict(X_test[col].values.reshape(-1,1))
predictions[col] = accuracy_score(y_test, y_predict)
print(predictions)
We can use我们可以用
array.values.reshape(-1,1)
values covert the series into 1-D array on which reshape is used to convert it into 2-D array值将系列转换为一维数组,在该数组上使用 reshape 将其转换为二维数组
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.