简体   繁体   English

在Haskell中并行构造树的策略

[英]Strategies for constructing tree in parallel in Haskell

I have a project where I'm building a Decision Tree in Haskell. 我有一个项目,我在Haskell中构建一个决策树 The generated trees will have multiple branches that are independent of each other, so I figured they could be constructed in parallel. 生成的树将具有多个彼此独立的分支,因此我认为它们可以并行构建。

The DecisionTree data type is defined like so: DecisionTree数据类型的定义如下:

data DecisionTree =
    Question Filter DecisionTree DecisionTree |    
    Answer DecisionTreeResult

instance NFData DecisionTree where
    rnf (Answer dtr)            = rnf dtr
    rnf (Question fil dt1 dt2)  = rnf fil `seq` rnf dt1 `seq` rnf dt2

Here's the part of the algorithm that constructs the tree 这是构造树的算法的一部分

constructTree :: TrainingParameters -> [Map String Value] -> Filter -> Either String DecisionTree    
constructTree trainingParameters trainingData fil =    
    if informationGain trainingData (parseFilter fil) < entropyLimit trainingParameters    
    then constructAnswer (targetVariable trainingParameters) trainingData    
    else
        Question fil <$> affirmativeTree <*> negativeTree `using` evalTraversable parEvalTree    
        where   affirmativeTree   = trainModel trainingParameters passedTData    
                negativeTree      = trainModel trainingParameters failedTData    
                passedTData       = filter (parseFilter fil) trainingData    
                failedTData       = filter (not . parseFilter fil) trainingData

parEvalTree :: Strategy DecisionTree    
parEvalTree (Question f dt1 dt2) = do    
    dt1' <- rparWith rdeepseq dt1    
    dt2' <- rparWith rdeepseq dt2    
    return $ Question f dt1' dt2'
parEvalTree ans = return ans

trainModel recursively calls constructTree . trainModel递归调用constructTree The relevant line for parallelism is 并行的相关路线是

Question fil <$> affirmativeTree <*> negativeTree `using` evalTraversable parEvalTree 

I'm building this with the GHC flags -threaded -O2 -rtsopts -eventlog and running it with stack exec -- performance-test +RTS -A200M -N -s -l (I'm on a 2 core machine). 我正在用GHC标志-threaded -O2 -rtsopts -eventlog它并使用stack exec -- performance-test +RTS -A200M -N -s -l运行它stack exec -- performance-test +RTS -A200M -N -s -l (我在2核机器上)。

But it doesn't seem to run anything in parallel 但它似乎并没有并行运行

SPARKS: 164 (60 converted, 0 overflowed, 0 dud, 0 GC'd, 104 fizzled)

INIT    time    0.000s  (  0.009s elapsed)
MUT     time   29.041s  ( 29.249s elapsed)
GC      time    0.048s  (  0.015s elapsed)
EXIT    time    0.001s  (  0.006s elapsed)
Total   time   29.091s  ( 29.279s elapsed)

threadscope输出

I suspect there might be some issue with recursive calls with rdeepseq and the Strategy for parallelism. 我怀疑使用rdeepseq和并行策略进行递归调用可能存在一些问题。 If some experienced Haskeller would chime in it would really make my day :) 如果一些经验丰富的Haskeller会发出声响,那真的会让我的一天成真:)

I am not an expert at Haskell performance/parallelism, but I think a couple of things are going on here. 我不是Haskell性能/并行性方面的专家,但我认为这里有一些事情正在发生。

Firstly, there is indeed this line: 首先,确实有这条线:

Question fil <$> affirmativeTree <*> negativeTree `using` evalTraversable parEvalTree 

Presumably, one might expect that the first part of this line builds up a datastructure that looks like 据推测,人们可能会期望该行的第一部分构建一个看起来像的数据结构

                      +-------+
                      | Right |
                      +-------+
                          |
                    +----------+
                    | Question |
                    +----------+
                     |   |    |
   +-----------------+   |    +-----------+
   |                +----+                |
   |                |                     |
+-----+   +-------------------+   +----------------+
| fil |   |       THUNK       |   |     THUNK      |
+-----+   | (affirmativeTree) |   | (negativeTree) |
          +-------------------+   +----------------+

The evalTraversable will then see the Right and run the parEvalTree on the Question , resulting in both thunks being sparked for deep evaluation in parallel. 然后evalTraversable将看到Right并在Question上运行parEvalTree ,导致两个thunk被激发以进行并行的深度评估。

Unfortunately, this isn't quite what happens, and I think the issue is due to the extra Either String . 不幸的是,这不是发生的事情,我认为问题是由于额外的Either String In order to evaluate the Question line (even just to WHNF), as evalTraversable must, we have to figure out whether the result is going to be a Right decisonTree or a Left _ . 为了评估Question行(即使只是对WHNF),如evalTraversable必须,我们必须弄清楚结果是一个Right decisonTree还是Left _ This means that affirmativeTree and negativeTree have to be evaluated to WHNF before parEvalTree can ever come into play. 这意味着在parEvalTree可以发挥作用之前,必须向WHNF评估affirmativeTreenegativeTree Unfortunately, due to the structure of your code, evaluating either tree to WHNF in this way forces pretty much everything---the filter selection has to be forced in order to see which branch the recursive constructTree call takes, and then its own recursive calls to trainModel are forced to WHNF in the same way. 不幸的是,由于你的代码的结构,以这种方式评估任何一个树到WHNF几乎所有东西---必须强制过滤器选择,以便看到递归constructTree调用采用哪个分支,然后是自己的递归调用to trainModel被迫以同样的方式进入WHNF。

This can be avoided by sparking off affirmativeTree and negativeTree separately first, and then only looking at the results in WHNF form after they've had time to be fully computed, by doing something like this: 这可以通过首先单独激发affirmativeTreenegativeTree来避免,然后只有在他们有时间完全计算之后才以WHNF形式查看结果,通过这样做:

uncurry (Question fil) <$> bisequence ((affirmativeTree, negativeTree) `using` parTuple2 rdeepseq rdeepseq)

If you run your code with this line replacing the original and load it into ThreadScope, you will see that there is clearly some increase in parallelism: the activity graph briefly goes above 1 in a few places, and execution jumps between HECs in several places. 如果使用此行替换原始代码并将其加载到ThreadScope中运行代码,您将看到并行性显然有所增加:活动图在几个地方短暂地超过1,并且在几个地方的HEC之间执行跳转。 Unfortunately, the vast majority of the program's time is still spent in sequential execution. 不幸的是,程序的绝大部分时间仍然花在顺序执行上。

I tried to look into this a little bit, and I think that something in your tree construction code may be a bit right-biased. 我试着稍微研究一下,我认为你的树形结构代码中的某些东西可能有点偏向右边。 I added some traceMarker s and traceEvent s, and it looks like there is frequently a fairly large imbalance between the positive and negative sides of a filter, which makes parallel execution not work very well: the positive subtree tends to finish very very quickly, while the negative subtree takes a long time, creating what looks like essentially sequential execution. 我添加了一些traceMarkertraceEvent ,看起来过滤器的正负两侧之间经常存在相当大的不平衡,这使得并行执行不能很好地工作:正子树倾向于非常快速地完成,而负面子树需要很长时间,创建看起来基本上顺序执行的东西。 In several cases, the positive subtree is so small that the core that sparked the computation finishes it and then begins the negative subtree before another core can wake up to steal the work. 在一些情况下,正子树非常小,以至于引发计算的核心完成它,然后在另一个核心醒来之前开始负面子树以窃取工作。 This is where the long runs on a single core in ThreadScope come from. 这是ThreadScope中单个核心的长距离运行的地方。 The short period of time with a fair bit of parallelism that you can see at the beginning of the graph is the time during which the negative subtree of the first filter is executing, since that's the main filter with a negative subtree large enough to really contribute to parallelism. 您可以在图表开头看到的具有相当并行性的短时间段是第一个过滤器的负子树正在执行的时间,因为这是主要过滤器,其负数子树大到足以真正贡献并行性。 There are also a few similar (but much smaller) events later in the trace where reasonably-sized negative trees are created. 稍后在跟踪中还会发生一些相似(但小得多)的事件,其中会创建合理大小的负树。

I would expect that if you make the change above and try to find filters that more evenly partition the dataset, you should see a fairly large increase in the parallelizability of this code. 我希望如果您进行上述更改并尝试查找更均匀地对数据集进行分区的过滤器,您应该会看到此代码的可并行性有相当大的增加。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM