简体   繁体   English

如何使用Haskell中的策略编写并行缩减?

[英]How do I write a parallel reduction using strategies in Haskell?

In high-performance computing, sums, products, etc are often calculated using a "parallel reduction" that takes n elements and completes in O(log n ) time (given enough parallelism). 在高性能计算中,总和,产品等通常使用“并行缩减”来计算,该“并行缩减”采用n个元素并在O(log n )时间内完成(给定足够的并行度)。 In Haskell, we usually use a fold for this kind of calculation, but evaluation time is always linear in the length of the list. 在Haskell中,我们通常使用折叠进行此类计算,但评估时间在列表长度中始终是线性的。

Data Parallel Haskell has some of this built in, but what about in the common framework of a list? Data Parallel Haskell内置了一些内容,但是在列表的通用框架中呢? Can we do it with Control.Parallel.Strategies ? 我们可以用Control.Parallel.Strategies做到吗?

So, assuming f is associative, how do we write 所以,假设f是关联的,我们如何写

parFold :: (a -> a -> a) -> [a] -> a

so that parFold f xs only needs time logarithmic in length xs ? 所以parFold f xs只需要时间length xs对数?

I don't think a list is the right data type for this. 我不认为列表是正确的数据类型。 Because it's just a linked list, the data will necessarily be accessed sequentially. 因为它只是一个链表,所以必须按顺序访问数据。 Although you can evaluate the items in parallel, you won't gain much in the reduction step. 虽然您可以并行评估项目,但在减少步骤中您将无法获得太多收益。 If you really need a List, I think the best function would be just 如果你真的需要一个List,我认为最好的功能就是

parFold f = foldl1' f . withStrategy (parList rseq)

or maybe 或者可能

parFold f = foldl1' f . withStrategy (parBuffer 5 rseq)

If the reduction step is complex, you might get a gain by subdividing the list like this: 如果缩减步骤很复杂,您可以通过细分列表获得收益,如下所示:

parReduce f = foldl' f mempty . reducedList . chunkList . withStrategy (parList rseq)
 where
  chunkList list = let (l,ls) = splitAt 1000 list in l : chunkList ls
  reducedList = parMap rseq (foldl' f mempty)

I've taken the liberty of assuming your data is a Monoid for mempty, if this isn't possible you can either replace mempty with your own empty type, or worse case use foldl1' . 我冒昧地假设你的数据是一个Monoid for mempty,如果这是不可能的,你可以用你自己的空类型替换mempty,或者更坏的情况下使用foldl1'

There are two operators from Control.Parallel.Strategies in use here. 这里有两个来自Control.Parallel.Strategies运算符。 The parList evaluates all items of the list in parallel. parList评估列表中的所有项目。 After that, the chunkList divides the list into chunks of 1000 elements. 之后, chunkList将列表分成1000个元素的块。 Each of those chunks is then reduced in parallel by the parMap . 然后通过parMap并行减少这些块中的每一个。

You might also try 你也可以试试

parReduce2 f = foldl' f mempty . reducedList . chunkList
 where
  chunkList list = let (l,ls) = splitAt 1000 list in l : chunkList ls
  reducedList = parMap rseq (foldl' f mempty)

Depending on exactly how the work is distributed, one of these may be more efficient than the others. 根据工作的确切分配方式,其中一个可能比其他工作更有效。

If you can use a data structure that has good support for indexing though (Array, Vector, Map, etc.), then you can do binary subdivisions for the reduction step, which will probably be better overall. 如果您可以使用对索引具有良好支持的数据结构(数组,向量,映射等),那么您可以为缩减步骤执行二进制细分,这可能会更好。

This seems like a good start: 这似乎是一个好的开始:

parFold :: (a -> a -> a) -> [a] -> a
parFold f = go
  where
  strategy = parList rseq

  go [x] = x
  go xs = go (reduce xs `using` strategy)

  reduce (x:y:xs) = f x y : reduce xs
  reduce list     = list   -- empty or singleton list

It works, but parallelism is not so great. 它有效,但并行性并不是那么好。 Replacing parList with something like parListChunks 1000 helps a bit, but speedup is still limited to under 1.5x on an 8-core machine. parListChunks 1000替换parList有所帮助,但在8核机器上加速仍然限制在1.5x以下。

Not sure what your parFold function is supposed to do. 不确定你的parFold功能应该做什么。 If that is intended to be a parallel version of foldr or foldl, I think its definition is wrong. 如果这是foldr或foldl的并行版本,我认为它的定义是错误的。

parFold :: (a -> a -> a) -> [a] -> a

// fold right in haskell (takes 3 arguments)
foldr :: (a -> b -> b) -> b -> [a] -> b

Fold applies the same function to each element of the list and accumulates the result of each application. 折叠将相同的函数应用于列表的每个元素,并累积每个应用程序的结果。 Coming up with a parallel version of it, i guess, would require that the function application to the elements are done in parallel - a bit like what parList does. 我想,提出它的并行版本将要求元素的函数应用程序并行完成 - 有点像parList那样做。

    par_foldr :: (NFData a, NFData b) => (a -> b -> b) -> b -> [a] -> b
    par_foldr f z [] = z
    par_foldr f z (x:xs) = res `using` \ _ -> rseq x' `par` rdeepseq res
                       where x' = par_foldr f z xs
                             res = x `f` x'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM