简体   繁体   English

/ usr / bin / time使用SPARK时对TOP的CPU利用率

[英]/usr/bin/time CPU utilization against TOP while using SPARK

I ran a SVM algorithm using MLIB library in Spark on a data of size 8G, and 7 million rows. 我使用Spark中的MLIB库在大小为8G和700万行的数据上运行SVM算法。 I am running Spark in standalone mode on a single node. 我在单个节点上以独立模式运行Spark。

I used /usr/bin/time -v to capture data about the job. 我使用/ usr / bin / time -v来捕获有关作业的数据。 I got the peak memory utilization, and % CPU time among other things. 我获得了峰值内存利用率和%CPU时间等。 The % CPU utilization I got was a mere 6%. 我获得的CPU利用率仅为6%。 I was monitoring TOP while the program was running as well for sometime and I could see more than 100% being used almost consistently. 我正在监控TOP,同时程序运行一段时间,我可以看到超过100%几乎一直使用。 I am now confused why /usr/bin/time showed only 6%? 我现在很困惑为什么/ usr / bin / time只显示6%?

Some more details - my machine is 16G, and the program I was running was consuming 13.88G. 更多细节 - 我的机器是16G,我运行的程序消耗13.88G。 The program executed in 2.1 hour. 该程序在2.1小时内执行。

Any insights, anyone? 任何见解,任何人?

I figured out the problem. 我解决了这个问题。 So, what usr/bin/time showed (6%) was a percentage of the total CPU available (8 threads in this case) while TOP was showing 100% for 1 single thread. 因此,usr / bin / time显示(6%)占可用总CPU的百分比(在这种情况下为8个线程),而TOP为1个单线程显示100%。

Btw, if it helps anyone, the reason why only 1 thread was being used instead of all 8 was that I had mentioned "local" and not "local[*] in my SparkContext (sc = SparkContext ("local", ...). Read more about it HERE . 顺便说一句,如果它对任何人有帮助,那么只使用1个线程而不是全部8个的原因是我在SparkContext中提到“本地”而不是“本地[*]”(sc = SparkContext(“local”,... )。在这里阅读更多相关信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM