[英]Strategies for constructing tree in parallel in Haskell
I have a project where I'm building a Decision Tree in Haskell. 我有一个项目,我在Haskell中构建一个决策树 。 The generated trees will have multiple branches that are independent of each other, so I figured they could be constructed in parallel.
生成的树将具有多个彼此独立的分支,因此我认为它们可以并行构建。
The DecisionTree
data type is defined like so: DecisionTree
数据类型的定义如下:
data DecisionTree =
Question Filter DecisionTree DecisionTree |
Answer DecisionTreeResult
instance NFData DecisionTree where
rnf (Answer dtr) = rnf dtr
rnf (Question fil dt1 dt2) = rnf fil `seq` rnf dt1 `seq` rnf dt2
Here's the part of the algorithm that constructs the tree 这是构造树的算法的一部分
constructTree :: TrainingParameters -> [Map String Value] -> Filter -> Either String DecisionTree
constructTree trainingParameters trainingData fil =
if informationGain trainingData (parseFilter fil) < entropyLimit trainingParameters
then constructAnswer (targetVariable trainingParameters) trainingData
else
Question fil <$> affirmativeTree <*> negativeTree `using` evalTraversable parEvalTree
where affirmativeTree = trainModel trainingParameters passedTData
negativeTree = trainModel trainingParameters failedTData
passedTData = filter (parseFilter fil) trainingData
failedTData = filter (not . parseFilter fil) trainingData
parEvalTree :: Strategy DecisionTree
parEvalTree (Question f dt1 dt2) = do
dt1' <- rparWith rdeepseq dt1
dt2' <- rparWith rdeepseq dt2
return $ Question f dt1' dt2'
parEvalTree ans = return ans
trainModel
recursively calls constructTree
. trainModel
递归调用constructTree
。 The relevant line for parallelism is 并行的相关路线是
Question fil <$> affirmativeTree <*> negativeTree `using` evalTraversable parEvalTree
I'm building this with the GHC flags -threaded -O2 -rtsopts -eventlog
and running it with stack exec -- performance-test +RTS -A200M -N -s -l
(I'm on a 2 core machine). 我正在用GHC标志
-threaded -O2 -rtsopts -eventlog
它并使用stack exec -- performance-test +RTS -A200M -N -s -l
运行它stack exec -- performance-test +RTS -A200M -N -s -l
(我在2核机器上)。
But it doesn't seem to run anything in parallel 但它似乎并没有并行运行
SPARKS: 164 (60 converted, 0 overflowed, 0 dud, 0 GC'd, 104 fizzled)
INIT time 0.000s ( 0.009s elapsed)
MUT time 29.041s ( 29.249s elapsed)
GC time 0.048s ( 0.015s elapsed)
EXIT time 0.001s ( 0.006s elapsed)
Total time 29.091s ( 29.279s elapsed)
I suspect there might be some issue with recursive calls with rdeepseq
and the Strategy for parallelism. 我怀疑使用
rdeepseq
和并行策略进行递归调用可能存在一些问题。 If some experienced Haskeller would chime in it would really make my day :) 如果一些经验丰富的Haskeller会发出声响,那真的会让我的一天成真:)
I am not an expert at Haskell performance/parallelism, but I think a couple of things are going on here. 我不是Haskell性能/并行性方面的专家,但我认为这里有一些事情正在发生。
Firstly, there is indeed this line: 首先,确实有这条线:
Question fil <$> affirmativeTree <*> negativeTree `using` evalTraversable parEvalTree
Presumably, one might expect that the first part of this line builds up a datastructure that looks like 据推测,人们可能会期望该行的第一部分构建一个看起来像的数据结构
+-------+
| Right |
+-------+
|
+----------+
| Question |
+----------+
| | |
+-----------------+ | +-----------+
| +----+ |
| | |
+-----+ +-------------------+ +----------------+
| fil | | THUNK | | THUNK |
+-----+ | (affirmativeTree) | | (negativeTree) |
+-------------------+ +----------------+
The evalTraversable
will then see the Right
and run the parEvalTree
on the Question
, resulting in both thunks being sparked for deep evaluation in parallel. 然后
evalTraversable
将看到Right
并在Question
上运行parEvalTree
,导致两个thunk被激发以进行并行的深度评估。
Unfortunately, this isn't quite what happens, and I think the issue is due to the extra Either String
. 不幸的是,这不是发生的事情,我认为问题是由于额外的
Either String
。 In order to evaluate the Question
line (even just to WHNF), as evalTraversable
must, we have to figure out whether the result is going to be a Right decisonTree
or a Left _
. 为了评估
Question
行(即使只是对WHNF),如evalTraversable
必须,我们必须弄清楚结果是一个Right decisonTree
还是Left _
。 This means that affirmativeTree
and negativeTree
have to be evaluated to WHNF before parEvalTree
can ever come into play. 这意味着在
parEvalTree
可以发挥作用之前,必须向WHNF评估affirmativeTree
和negativeTree
。 Unfortunately, due to the structure of your code, evaluating either tree to WHNF in this way forces pretty much everything---the filter selection has to be forced in order to see which branch the recursive constructTree
call takes, and then its own recursive calls to trainModel
are forced to WHNF in the same way. 不幸的是,由于你的代码的结构,以这种方式评估任何一个树到WHNF几乎所有东西---必须强制过滤器选择,以便看到递归
constructTree
调用采用哪个分支,然后是自己的递归调用to trainModel
被迫以同样的方式进入WHNF。
This can be avoided by sparking off affirmativeTree
and negativeTree
separately first, and then only looking at the results in WHNF form after they've had time to be fully computed, by doing something like this: 这可以通过首先单独激发
affirmativeTree
和negativeTree
来避免,然后只有在他们有时间完全计算之后才以WHNF形式查看结果,通过这样做:
uncurry (Question fil) <$> bisequence ((affirmativeTree, negativeTree) `using` parTuple2 rdeepseq rdeepseq)
If you run your code with this line replacing the original and load it into ThreadScope, you will see that there is clearly some increase in parallelism: the activity graph briefly goes above 1 in a few places, and execution jumps between HECs in several places. 如果使用此行替换原始代码并将其加载到ThreadScope中运行代码,您将看到并行性显然有所增加:活动图在几个地方短暂地超过1,并且在几个地方的HEC之间执行跳转。 Unfortunately, the vast majority of the program's time is still spent in sequential execution.
不幸的是,程序的绝大部分时间仍然花在顺序执行上。
I tried to look into this a little bit, and I think that something in your tree construction code may be a bit right-biased. 我试着稍微研究一下,我认为你的树形结构代码中的某些东西可能有点偏向右边。 I added some
traceMarker
s and traceEvent
s, and it looks like there is frequently a fairly large imbalance between the positive and negative sides of a filter, which makes parallel execution not work very well: the positive subtree tends to finish very very quickly, while the negative subtree takes a long time, creating what looks like essentially sequential execution. 我添加了一些
traceMarker
和traceEvent
,看起来过滤器的正负两侧之间经常存在相当大的不平衡,这使得并行执行不能很好地工作:正子树倾向于非常快速地完成,而负面子树需要很长时间,创建看起来基本上顺序执行的东西。 In several cases, the positive subtree is so small that the core that sparked the computation finishes it and then begins the negative subtree before another core can wake up to steal the work. 在一些情况下,正子树非常小,以至于引发计算的核心完成它,然后在另一个核心醒来之前开始负面子树以窃取工作。 This is where the long runs on a single core in ThreadScope come from.
这是ThreadScope中单个核心的长距离运行的地方。 The short period of time with a fair bit of parallelism that you can see at the beginning of the graph is the time during which the negative subtree of the first filter is executing, since that's the main filter with a negative subtree large enough to really contribute to parallelism.
您可以在图表开头看到的具有相当并行性的短时间段是第一个过滤器的负子树正在执行的时间,因为这是主要过滤器,其负数子树大到足以真正贡献并行性。 There are also a few similar (but much smaller) events later in the trace where reasonably-sized negative trees are created.
稍后在跟踪中还会发生一些相似(但小得多)的事件,其中会创建合理大小的负树。
I would expect that if you make the change above and try to find filters that more evenly partition the dataset, you should see a fairly large increase in the parallelizability of this code. 我希望如果您进行上述更改并尝试查找更均匀地对数据集进行分区的过滤器,您应该会看到此代码的可并行性有相当大的增加。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.