简体   繁体   English

clojure pmap 与地图

[英]clojure pmap vs map

I tested the clojure function map and pmap as show below, in cojure REPL.我在 cojure REPL 中测试了 clojure 函数 map 和 pmap,如下所示。 It make me confused: why the parallel pmap is slower than map?这让我感到困惑:为什么并行 pmap 比 map 慢?

user=> (def lg (range 1 10000000))
user=> (time (def rs (doall (pmap #(* % %) lg))))

"Elapsed time: **125739.056** msecs"

# -------------------------------------------------------
user=> (def lg (range 1 10000000))
user=> (time (def rs (doall (map #(* % %) lg))))

"Elapsed time: **5804.485** msecs"

**PS: the machine has 8 cores**

With every parallel processing task, there is some amount of overhead due to task coordination.对于每个并行处理任务,由于任务协调而存在一定的开销。 pmap applies the mapping function to each element individually in a different thread. pmap将映射函数分别应用于不同线程中的每个元素。 As the lazy sequence returned by pmap is consumed, the consumer thread must coordinate with the producer threads.pmap返回的惰性序列被消耗时,消费者线程必须与生产者线程协调。 The way pmap is defined, this overhead occurs for each and every element produced. pmap的定义方式,这种开销发生在每个生成的元素上。

Considering this, when you use pmap to compute a simple function (such as squaring a number, as in your example), the time it takes for the threads to coordinate their activities swamps the time it takes to actually compute the value.考虑到这一点,当您使用pmap计算一个简单的函数时(例如在您的示例中对数字进行平方),线程协调其活动所需的时间会淹没实际计算该值所需的时间。 As the docstring says, pmap is "only useful for computationally intensive functions where the time of f dominates the coordination overhead" (empasis added).正如文档字符串所说, pmap “仅对 f 的时间主导协调开销的计算密集型函数有用”(empasis 添加)。 In these cases, pmap will take longer than map regardless of how many cores you have.在这些情况下,无论您拥有多少个内核, pmap都会比map花费更长的时间。

To actually see a benefit from pmap , you must choose a "harder" problem.要真正看到pmap的好处,您必须选择一个“更难”的问题。 In some cases, this may be as simple as partitioning the input sequence into chunks.在某些情况下,这可能就像将输入序列分成块一样简单。 Then the sequence of chunks can be processed with pmap and then run through concat to get the final output.然后可以使用pmap处理块序列,然后通过concat运行以获得最终输出。

For example:例如:

(defn chunked-pmap [f partition-size coll]
  (->> coll                           ; Start with original collection.

       (partition-all partition-size) ; Partition it into chunks.

       (pmap (comp doall              ; Map f over each chunk,
                   (partial map f)))  ; and use doall to force it to be
                                      ; realized in the worker thread.

       (apply concat)))               ; Concatenate the chunked results
                                      ; to form the return value.

However, there is also an overhead for partitioning the sequence and concatenating the chunks at the end.但是,对序列进行分区并在最后连接块也存在开销。 For example, at least on my machine, chunked-pmap still under-performed map by a significant amount for your example.例如,至少在我的机器上,对于您的示例, chunked-pmap仍然明显低于map Still, it may be effective for some functions.不过,它可能对某些功能有效。

Another way to improve the effectiveness of pmap is to partition the work at a different place in the overall algorithm.另一种提高pmap有效性的方法是在整个算法的不同位置对工作进行分区。 For example, suppose we were interested in calculating the euclidean distance between pairs of points.例如,假设我们对计算点对之间的欧几里德距离感兴趣。 While parallelizing the square function has proven to be ineffective, we might have some luck parallelizing the entire distance function.虽然并行化平方函数已被证明是无效的,但我们可能有幸并行化整个距离函数。 Realistically, we would want to partition the task at an even higher level, but that is the gist of it.实际上,我们希望在更高的层次上划分任务,但这就是它的要点。

In short, the performance of parallel algorithms are sensitive to the manner in which the task is partitioned, and you have chosen a level that is too granular for your test.简而言之,并行算法的性能对任务的分区方式很敏感,并且您选择的级别对于您的测试来说过于细化。

Rörd is correct, there's a significant overhead in using pmap. Rörd 是正确的,使用 pmap 有很大的开销。 consider using reducers instead:考虑改用减速器:

(def l (range 10000000))

(time (def a (doall (pmap #(* % %) l))))
"Elapsed time: 14674.415781 msecs"

(time (def a (doall (map #(* % %) l))))
"Elapsed time: 1119.107447 msecs"

(time (def a (doall (into [] (r/map #(* % %) l)))))
"Elapsed time: 1049.754652 msecs"

There is some overhead for creating the threads, splitting the workload between them and reassembling the results.创建线程、在它们之间分配工作负载并重新组合结果会产生一些开销。 You will need a function that runs significantly longer than #(* % %) to see a speed improvement from pmap (and it does of course also depend on the number of cores of your CPU which you didn't specify in your question).您将需要一个运行时间明显长于#(* % %)的函数才能看到pmap的速度提升(当然,它也取决于您未在问题中指定的 CPU 内核数)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM