简体   繁体   English

Jupyter Notebooks 中用于机器学习任务的 CPU 使用率/速度

[英]CPU usage/speed in Jupyter Notebooks for machine learning tasks

I've just built a brand new powerful desktop PC in order to speed up my scikit learn computations ( specs here ).我刚刚构建了一个全新的强大的台式电脑,以加快我的 scikit 学习计算(此处的规格)。

I run my code in a Jupyter Notebook and I noticed that if I run the same computation on my old dying laptop and my super-PC the time difference is often small, although on some very demanding cells in can vary from simple to double between the two computers… But my new PC is suppose to be at least 5 times more powerful than my old laptop!我在 Jupyter Notebook 中运行我的代码,我注意到,如果我在我即将死去的旧笔记本电脑和我的超级 PC 上运行相同的计算,时间差异通常很小,尽管在一些非常苛刻的单元上,可能会从简单到两倍之间变化两台电脑……但我的新电脑应该比我的旧笔记本电脑至少强大 5 倍!

Demanding code example:要求苛刻的代码示例:

y_train_large = (y_train >= 7)
y_train_odd = (y_train % 2 == 1)
y_multilabel = np.c_[y_train_large, y_train_odd]
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_multilabel)    
y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_multilabel, cv=3)
f1_score(y_multilabel, y_train_knn_pred, average="macro")

Also, when I check the CPU usage during a classifier training for instance, it's very low on both computers (around 5% on the new one and 15-20% on the old one).此外,例如,当我在分类器训练期间检查 CPU 使用率时,两台计算机上的 CPU 使用率都非常低(新计算机约为 5%,旧计算机约为 15-20%)。

I realise that it may be a big noob question but why is that?我意识到这可能是一个很大的菜鸟问题,但为什么呢? I read here that Jupyter Notebooks run on the host machine not mine.在这里读到 Jupyter Notebooks 在不是我的主机上运行。 How to use my own hardware instead?如何改用我自己的硬件? I probably search the wrong way but I cannot find a lot of informations on that subject.我可能搜索错误的方式,但我找不到很多关于该主题的信息。 What to search for?要搜索什么?

Thanks !谢谢 !

Time report for the code above with the small change of setting n_jobs=4 for cross_val_predict():上面代码的时间报告,对于 cross_val_predict() 设置 n_jobs=4 的小变化:

Computing time for AMD Ryzen 9 3900x 12 cores, RAM 32Go : 12'45'' approx. AMD Ryzen 9 3900x 12 核、RAM 32Go 的计算时间:约12'45'' average CPU usage 15%平均 CPU 使用率 15%

Computing time for Intel i7 4750HQ @ 2.00 GHz, RAM 16Go : 19'50'' approx. Intel i7 4750HQ @ 2.00 GHz,RAM 16Go 的计算时间:约19'50'' average CPU usage 62%平均 CPU 使用率 62%

OK, so for this particular piece of code, increasing the n_jobs parameter of cross_val_predict() to n_jobs=4 gives a good improvement but still it's unclear to me:好的,所以对于这段特定的代码,将 cross_val_predict() 的 n_jobs 参数增加到 n_jobs=4 会带来很好的改进,但我仍然不清楚:

  • How to proceed on other machine learning tasks?如何继续其他机器学习任务?
  • Are there other parameters to tweak in order to get even better results than this?是否有其他参数需要调整以获得比这更好的结果?
  • How far can we go with n_jobs, how to evaluate the best n_jobs for a given task, how to know when we go too far and the CPU is at risk?我们可以用 n_jobs 走多远,如何评估给定任务的最佳 n_jobs,如何知道我们何时走得太远并且 CPU 处于危险之中?

Any expert on those matters is still welcome to answer :)仍然欢迎任何有关这些问题的专家回答:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM