为什么 scikit-learn SVM.SVC() 非常慢？

Question

I tried to use SVM classifier to train a data with about 100k samples, but I found it to be extremely slow and even after two hours there was no response.我尝试使用 SVM 分类器来训练大约 100k 样本的数据，但我发现它非常慢，甚至在两个小时后也没有响应。 When the dataset has around 1k samples, I can get the result immediately.当数据集有大约 1k 个样本时，我可以立即得到结果。 I also tried SGDClassifier and naïve bayes which is quite fast and I got results within couple of minutes.我还尝试了 SGDClassifier 和 naïvebayes，它们的速度非常快，我在几分钟内就得到了结果。 Could you explain this phenomena?你能解释一下这个现象吗？

Answer 1

General remarks about SVM-learning关于 SVM 学习的一般评论

SVM-training with nonlinear-kernels, which is default in sklearn's SVC, is complexity-wise approximately: O(n_samples^2 * n_features) link to some question with this approximation given by one of sklearn's devs .带有非线性内核的 SVM 训练，这是 sklearn 的 SVC 中的默认设置，在复杂性方面大约是： O(n_samples^2 * n_features) 链接到一些问题，该近似值由 sklearn 的 devs 之一给出。 This applies to the SMO-algorithm used within libsvm , which is the core-solver in sklearn for this type of problem.这适用于libsvm 中使用的SMO 算法，它是 sklearn 中针对此类问题的核心求解器。

This changes much when no kernels are used and one uses sklearn.svm.LinearSVC (based on liblinear ) or sklearn.linear_model.SGDClassifier .当不使用内核并且使用sklearn.svm.LinearSVC （基于liblinear ）或sklearn.linear_model.SGDClassifier时，这会发生很大变化。

So we can do some math to approximate the time-difference between 1k and 100k samples:所以我们可以做一些数学计算来近似 1k 和 100k 样本之间的时间差：

1k = 1000^2 = 1.000.000 steps = Time X
100k = 100.000^2 = 10.000.000.000 steps = Time X * 10000 !!!

This is only an approximation and can be even worse or less worse (eg setting cache-size; trading-off memory for speed-gains)!这只是一个近似值，甚至可能更糟或更糟（例如设置缓存大小；权衡内存以提高速度）！

Scikit-learn specific remarks Scikit-learn 具体备注

The situation could also be much more complex because of all that nice stuff scikit-learn is doing for us behind the bars.情况也可能更加复杂，因为 scikit-learn 在幕后为我们做的所有好事。 The above is valid for the classic 2-class SVM.以上适用于经典的 2-class SVM。 If you are by any chance trying to learn some multi-class data;如果您有机会尝试学习一些多类数据； scikit-learn will automatically use OneVsRest or OneVsAll approaches to do this (as the core SVM-algorithm does not support this). scikit-learn 将自动使用 OneVsRest 或 OneVsAll 方法来执行此操作（因为核心 SVM 算法不支持此操作）。 Read up scikit-learns docs to understand this part.阅读 scikit-learns 文档以了解这部分内容。

The same warning applies to generating probabilities: SVM's do not naturally produce probabilities for final-predictions.同样的警告也适用于生成概率：SVM 不会自然地生成最终预测的概率。 So to use these (activated by parameter) scikit-learn uses a heavy cross-validation procedure called Platt scaling which will take a lot of time too!因此，要使用这些（由参数激活），scikit-learn 使用称为Platt 缩放的繁重交叉验证程序，这也将花费大量时间！

Scikit-learn documentation Scikit-learn 文档

Because sklearn has one of the best docs, there is often a good part within these docs to explain something like that ( link ):因为 sklearn 拥有最好的文档之一，所以这些文档中通常有很好的部分来解释类似的内容（链接）：

Answer 2

If you are using intel CPU then Intel has provided the solution for it.如果您使用的是英特尔 CPU，那么英特尔已经为其提供了解决方案。 Intel Extension for Scikit-learn offers you a way to accelerate existing scikit-learn code. Intel Extension for Scikit-learn 为您提供了一种加速现有 scikit-learn 代码的方法。 The acceleration is achieved through patching: replacing the stock scikit-learn algorithms with their optimized versions provided by the extension.加速是通过修补实现的：用扩展提供的优化版本替换库存的 scikit-learn 算法。 You should follow the following steps:您应该遵循以下步骤：

First install intelex package for sklearn首先为sklearn安装intelex包

pip install scikit-learn-intelex点安装 scikit-learn-intelex

Now just add the following line in the top of the program现在只需在程序顶部添加以下行

from sklearnex import patch_sklearn从 sklearnex 导入 patch_sklearn

patch_sklearn() patch_sklearn()

Now run the program it will be much faster than before.现在运行程序，它会比以前快得多。

You can read more about it from the following link: https://intel.github.io/scikit-learn-intelex/您可以从以下链接了解更多信息： https : //intel.github.io/scikit-learn-intelex/

为什么 scikit-learn SVM.SVC() 非常慢？

问题描述

2 个解决方案

解决方案1
57 已采纳 2016-10-17 02:31:27

General remarks about SVM-learning关于 SVM 学习的一般评论

Scikit-learn specific remarks Scikit-learn 具体备注

Scikit-learn documentation Scikit-learn 文档

解决方案2
0 2022-01-08 08:18:08

为什么 scikit-learn SVM.SVC() 非常慢？

问题描述

2 个解决方案

解决方案1 57 已采纳 2016-10-17 02:31:27

General remarks about SVM-learning关于 SVM 学习的一般评论

Scikit-learn specific remarks Scikit-learn 具体备注

Scikit-learn documentation Scikit-learn 文档

解决方案2 0 2022-01-08 08:18:08

解决方案1
57 已采纳 2016-10-17 02:31:27

解决方案2
0 2022-01-08 08:18:08