简体   繁体   中英

How to write nested loop problem using parallel strategies in Haskell

I'm playing with parallel strategies and wondering if I'm doing the following the right way. Java code:

    double x = 0.0;
    double[] arr = new double[2000];

    for (int i = 0; i < arr.length; i++) 
        arr[i] = i;

    for (int i = 0; i < arr.length; i++) {
        x += arr[i] * 5;

        for (int j = i + 1; j < arr.length; j++)
            x -= arr[j] * 3;
    }

Haskell program which uses parallel strategies to compute the result:

    n = 2000
    ns = [0..n-1]

    segments = chunk 100 ns

    chunk n [] = []
    chunk n xs = ys : chunk n zs
      where (ys,zs) = splitAt n xs

    parCompute = foldl' (+) 0 (map (\ts -> compute ts) segments `using` parList rdeepseq)

    compute ts = foldl' addfunc 0 ts
        where
            addfunc acc i = (acc + x) - (foldl' minusfunc 0 [(i+1)..(n-1)])
                where
                    x = (ns!!i) * 5
                    minusfunc acc' j = (acc' + x')
                        where
                            x' = (ns!!j) * 3

    main = print parCompute

My questions are:

  • is it right to use foldl' here? I thought since all computations need to be done to get the result, I should force evaluate.

  • is there a better way to use segments? what common patterns are present in this problem that i can exploit?

  • what other strategies could be applied to this problem? Also, any possibility of parallelising just using par and seq primitives.

Here is how I would translate your Java program into a parallel Haskell program:

parCompute ts = sum (computes `using` parListChunk 100 rseq)
  where 
    computes  = zipWith f ts (tail (tails ts))
    f t tls   = 5 * t - 3 * sum tls

First off - yes, introducing strictness is a good idea here. On the other hand, GHC is smart enough to spot this as well! In fact, whether you use foldl , foldl' or simply sum , the generated code is exactly the same.

For evaluating the list in segments, you can simply use the chunking strategy as indicated above. The amount of work each chunk represents might vary wildly, however, so you could try to even it out by making bigger chunks for the end of the list. Apart from that, I don't think there is much room for improvement here.

Ok, let's use REPA (REgular Parallel Arrays) this time and compare it with the parListChunk method (since the java example uses an array not a list) :

module Main where

import Control.Parallel.Strategies
import Data.List (tails)
import System.Environment (getArgs)
import qualified Data.Array.Repa as R
import qualified Data.Array.Repa.Shape as RS

chunksize = 100

parListCompute :: [Int] -> [Int]
parListCompute ts = (computes `using` parListChunk chunksize rseq)
  where
    computes = zipWith f ts (tail (tails ts))
    f t tls  = 5 * t - 3 * sum tls

parRepaCompute :: R.Array R.DIM1 Int -> R.Array R.DIM1 Int
parRepaCompute arr = R.force $ computes
  where
    computes    = R.map f arr
    f x         = 5*x - 3*(sumRest (x+1) 0)
    sumRest x acc | x > (RS.size . R.extent $ arr) = acc
                  | otherwise                      = sumRest (x+1) (acc+x)

main = do
  (s:_) <- getArgs
  case s of
    "1" -> putStrLn . show .sum $ parListCompute l
    "2" -> putStrLn . show . R.sum $ parRepaCompute r
  where l = [1..70000]
        r = R.fromList (R.Z R.:. (length l)) l

And here is the result:

~/haskell$ ghc --make nestloop.hs -O2 -rtsopts -threaded 
[1 of 1] Compiling Main             ( nestloop.hs, nestloop.o )
Linking nestloop ...
haskell$ time ./nestloop 1 +RTS -N4
-342987749755000

real    0m5.115s
user    0m19.870s
sys     0m0.170s
~/haskell$ time ./nestloop 2 +RTS -N4
[-342987749755000]

real    0m1.658s
user    0m3.670s
sys     0m0.070s

I hope you'll like this comparison.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM