简体   繁体   English

为什么 cross_val_predict 比适合 KNeighborsClassifier 慢得多?

[英]Why is cross_val_predict so much slower than fit for KNeighborsClassifier?

Running locally on a Jupyter notebook and using the MNIST dataset (28k entries, 28x28 pixels per image, the following takes 27 seconds .在 Jupyter 笔记本上本地运行并使用 MNIST 数据集(28k 个条目,每张图像 28x28 像素,以下需要27 秒

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier(n_jobs=1)
knn_clf.fit(pixels, labels)

However, the following takes 1722 seconds , in other words ~64 times longer :但是,以下需要1722 秒,换句话说,长约 64 倍

from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(knn_clf, pixels, labels, cv = 3, n_jobs=1)

My naive understanding is that cross_val_predict with cv=3 is doing 3-fold cross validation, so I'd expect it to fit the model 3 times, and so take at least ~3 times longer, but I don't see why it would take 64x!我天真的理解是带有cv=3 cross_val_predict正在做 3 倍交叉验证,所以我希望它适合模型 3 次,所以至少需要大约 3 倍的时间,但我不明白为什么会这样采取 64 倍!

To check if it was something specific to my environment, I ran the same in a Colab notebook - the difference was less extreme ( 15x ), but still way above the ~3x I expected:为了检查它是否特定于我的环境,我在Colab 笔记本中运行了相同的内容- 差异不那么极端( 15x ),但仍远高于我预期的 ~3x :

What am I missing?我错过了什么? Why is cross_val_predict so much slower than just fitting the model?为什么 cross_val_predict 比仅仅拟合模型慢这么多?

In case it matters, I'm running scikit-learn 0.20.2.以防万一,我正在运行 scikit-learn 0.20.2。

KNN is also called as lazy algorithm because during fitting it does nothing but saves the input data, specifically there is no learning at all. KNN也被称为惰性算法,因为在拟合过程中它除了保存输入数据之外什么都不做,特别是根本没有学习。

During predict is the actual distance calculation happens for each test datapoint.在预测期间,对每个测试数据点进行实际距离计算。 Hence, you could understand that when using cross_val_predict , KNN has to predict on the validation data points, which makes the computation time higher!因此,您可以理解,在使用cross_val_predictKNN必须对验证数据点进行预测,这使得计算时间cross_val_predict

cross_val_predict 进行拟合和预测,因此可能需要比拟合更长的时间,但我没想到会长 64 倍

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM