简体   繁体   English

Haskell-parMap怎么了?

[英]Haskell - What's up with parMap?

I've run some tests: 我已经进行了一些测试:

import Control.Parallel.Strategies
import Data.Vector as V
import Data.Maybe

parMapVec :: (a -> b) -> Vector a -> Vector b
parMapVec f v = runEval $ evalTraversable rpar $ V.map f v

range :: Integer -> Integer -> Vector Integer
range x y
  | x == y = x `cons` empty
  | x < y  = x `cons` (range (x + 1) y)
  | x > y  = (range x (y + 1)) `snoc` y

fac :: Integer -> Integer
fac n
  | n < 2     = 1
  | otherwise = n * (fac $ n - 1)

main :: IO ()
main = do
  let result = runEval $ do
        let calc = parMapVec fac $ 80000 `range` 80007
        rseq calc
        return calc
  putStrLn $ show result

As well as with the following modification to main to make sure that my parMapVector wasn't what was wrong: 以及对main的以下修改,以确保我的parMapVector没错:

main = do
  let result = runEval $ do
        let calc = parMap rpar fac [80000..80007]
        rseq calc
        return calc
  putStrLn $ show result

I compiled with gch --make parVectorTest.hs -threaded -rtsopts and ran both with ./parVectorTest -s . 我使用gch --make parVectorTest.hs -threaded -rtsopts编译,并同时使用./parVectorTest -s

Here's what I found with the version with vectors: 这是我发现带有向量的版本的结果:

56,529,547,832 bytes allocated in the heap
10,647,896,984 bytes copied during GC
    7,281,792 bytes maximum residency (16608 sample(s))
    3,285,392 bytes maximum slop
            21 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
Gen  0     82708 colls,     0 par    0.828s   0.802s     0.0000s    0.0016s
Gen  1     16608 colls,     0 par   15.006s  14.991s     0.0009s    0.0084s

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

SPARKS: 8 (7 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)

INIT    time    0.001s  (  0.001s elapsed)
MUT     time    5.368s  (  5.369s elapsed)
GC      time   15.834s  ( 15.793s elapsed)
EXIT    time    0.001s  (  0.000s elapsed)
Total   time   21.206s  ( 21.163s elapsed)

Alloc rate    10,530,987,847 bytes per MUT second

Productivity  25.3% of total user, 25.4% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

So that's good, except that I watched the process execute on my system monitor, and only one core was working at a time. 这样很好,除了我在系统监视器上看到该进程执行并且一次只工作一个内核。 Every time one of the results was printed out, the process would switch to a different core. 每次打印出其中一个结果时,该过程就会切换到另一个核心。 So I thought it was something wrong with my parMapVec function. 所以我认为我的parMapVec函数出了问题。 But then I did the same thing except with the version with lists: 但是后来我做了同样的事情,除了带有列表的版本:

56,529,535,488 bytes allocated in the heap
12,483,967,024 bytes copied during GC
    6,246,872 bytes maximum residency (19843 sample(s))
    2,919,544 bytes maximum slop
            20 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
Gen  0     79459 colls,     0 par    0.818s   0.786s     0.0000s    0.0009s
Gen  1     19843 colls,     0 par   17.725s  17.709s     0.0009s    0.0087s

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

SPARKS: 16 (14 converted, 0 overflowed, 0 dud, 1 GC'd, 1 fizzled)

INIT    time    0.001s  (  0.001s elapsed)
MUT     time    5.394s  (  5.400s elapsed)
GC      time   18.543s  ( 18.495s elapsed)
EXIT    time    0.000s  (  0.000s elapsed)
Total   time   23.940s  ( 23.896s elapsed)

Alloc rate    10,479,915,927 bytes per MUT second

Productivity  22.5% of total user, 22.6% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0

So there was more garbage collection, which makes sense. 因此,有更多的垃圾收集,这是有道理的。 And there was also more sparks, which I don't know how to explain. 而且还有更多的火花,我不知道该如何解释。 This program exhibited the same behavior when I watched it execute on my system monitor. 当我看到它在系统监视器上执行时,该程序表现出相同的行为。

I also ran both tests with ./parVector -s -C0.01 because of the answer to this question and got basically the same results. 由于此问题的答案,我也使用./parVector -s -C0.01了两个测试,结果基本相同。 I'm on a Lenovo Ideapad, 8 cores, running Ubuntu Linux 17.04. 我使用的是8核,运行Ubuntu Linux 17.04的Lenovo Ideapad。 At the time of the tests, the only apps I had open were VS Code and my system monitor, although there other processes taking up a very small portion of the processing power. 在测试时,我打开的唯一应用程序是VS Code和系统监视器,尽管其他进程只占用很小的处理能力。 Does a processor have to be completely idle to take a spark? 处理器是否必须完全空闲才能发出火花?

By default, GHC runs all programs using a single OS thread, even with -threaded enabled. 默认情况下,即使启用了-threaded ,GHC也会使用单个OS线程运行所有程序。 Note the text "using -N1" in your output - it indicates that the program is being run with 1 physical thread. 注意输出中的文本“ using -N1”,它表示程序正在使用1个物理线程运行。

In short: pass eg +RTS -N8 to your program. 简而言之:将+RTS -N8给您的程序。 For documentation of this flag, see here . 有关此标志的文档,请参见此处


In a broad sense, this is due to the distinction between parallelism and concurrency. 从广义上讲,这是由于并行性与并发性之间的区别所致。 Here are some SO questions which try to explain the difference. 这里 一些 SO问题试图解释差异。 The difference can be summarized as: 区别可以总结为:

  • parrallelism: a task subdivided into similar chunks to run simultaneously on separate cores/CPUs at some point in time; 并行性:将任务细分为相似的块,以便在某个时间点同时在单独的内核/ CPU上运行; for increased speed 提高速度

  • concurrency: several tasks being executed conceptually independently such that their execution times overlap, whether on the same thread through time slicing or on separate cores/CPUs; 并发性:几个任务在概念上是独立执行的,因此它们的执行时间会重叠,无论是在通过时间片的同一线程上还是在单独的内核/ CPU上; usually utilizing shared resources more efficiently 通常更有效地利用共享资源

However, these definitions are somewhat contentious; 但是,这些定义有些争议。 sometimes the two have opposite meanings, and sometimes they are used interchangeably. 有时两者具有相反的含义,有时它们可​​以互换使用。 However, for the purpose of understanding this problem (why you must pass another flag in addition to -threaded to make a 'parallel' program actually run in parallel) I believe they are useful definitions. 但是,出于理解该问题的目的(为什么除了-threaded之外还必须传递另一个标志才能使“并行”程序实际并行运行),我相信它们是有用的定义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM