简体繁体 English

time使用SPARK时对TOP的CPU利用率

[英]/usr/bin/time CPU utilization against TOP while using SPARK

原文 2016-04-30 02:33:51 1 1 linux/ hadoop/ apache-spark/ linux-kernel/ apache-spark-mllib

I ran a SVM algorithm using MLIB library in Spark on a data of size 8G, and 7 million rows. 我使用Spark中的MLIB库在大小为8G和700万行的数据上运行SVM算法。 I am running Spark in standalone mode on a single node. 我在单个节点上以独立模式运行Spark。

I used /usr/bin/time -v to capture data about the job. 我使用/ usr / bin / time -v来捕获有关作业的数据。 I got the peak memory utilization, and % CPU time among other things. 我获得了峰值内存利用率和％CPU时间等。 The % CPU utilization I got was a mere 6%. 我获得的CPU利用率仅为6％。 I was monitoring TOP while the program was running as well for sometime and I could see more than 100% being used almost consistently. 我正在监控TOP，同时程序运行一段时间，我可以看到超过100％几乎一直使用。 I am now confused why /usr/bin/time showed only 6%? 我现在很困惑为什么/ usr / bin / time只显示6％？

Some more details - my machine is 16G, and the program I was running was consuming 13.88G. 更多细节 - 我的机器是16G，我运行的程序消耗13.88G。 The program executed in 2.1 hour. 该程序在2.1小时内执行。

Any insights, anyone? 任何见解，任何人？

1 个解决方案

I figured out the problem. 我解决了这个问题。 So, what usr/bin/time showed (6%) was a percentage of the total CPU available (8 threads in this case) while TOP was showing 100% for 1 single thread. 因此，usr / bin / time显示（6％）占可用总CPU的百分比（在这种情况下为8个线程），而TOP为1个单线程显示100％。

Btw, if it helps anyone, the reason why only 1 thread was being used instead of all 8 was that I had mentioned "local" and not "local[*] in my SparkContext (sc = SparkContext ("local", ...). Read more about it HERE . 顺便说一句，如果它对任何人有帮助，那么只使用1个线程而不是全部8个的原因是我在SparkContext中提到“本地”而不是“本地[*]”（sc = SparkContext（“local”，... ）。在这里阅读更多相关信息。