简体   繁体   English

优化在Haskell中实现的BFS

[英]Optimizing BFS implemented in Haskell

So I wanted to write a graph breadth-first search. 所以我想写一个图广度优先搜索。 The algorithm keeps track of some values in it's state. 该算法会跟踪其状态下的某些值。 Those are: a visited state for each node and a queue. 这些是:每个节点的visited状态和队列。 It also needs to know the edges of the graph and what it's target destination is, but that doesn't change in steps. 它还需要知道图形的边缘及其目标位置是什么,但这不会逐步改变。

This is what I came up with (sorry for the uglyness) 这是我想出的(对不起,很抱歉)

import Prelude hiding (take, cons, drop)
import Data.Vector

type BFSState = (Vector Bool, Vector [Int], [Int], Int)
bfsStep :: BFSState -> BFSState
bfsStep (nodes, edges, current:queue, target)
    | current == target = (nodes, edges, [], target)
    | nodes ! current = (nodes, edges, queue, target)
    | otherwise = (markedVisited, edges, queue Prelude.++ (edges ! current), target)
    where
        markedVisited = (take current nodes) Data.Vector.++ (cons True (drop (current + 1) nodes))

bfsSteps :: BFSState -> [BFSState]
bfsSteps init = steps
    where steps = init : Prelude.map bfsStep steps 

The bfsStep takes a state and produces the next one. bfsStep接受一个状态并产生下一个状态。 When the state's queue is [], the target node has been found. 当状态队列为[]时,已找到目标节点。 bfsSteps just uses a self-referring list to make a list of BFSState s. bfsSteps仅使用一个自引用列表来制作BFSState的列表。 Now, currently there's no way to find out how many steps it takes to get to a certain node (given the starting conditions) but the bfsSteps function will produce the steps that the algorithm took. 现在,目前无法确定到达某个节点需要多少步骤(给定起始条件),但是bfsSteps函数将生成算法采取的步骤。

What I'm concerned about is that the state gets copied every step. 我担心的是状态会被复制到每一步。 I realize that concatenation with ++ doesn't perform well but I feel that it honestly doesn't matter since ALL of the state gets copied every step. 我意识到与++的连接效果不佳,但是老实说,这无关紧要,因为所有状态都在每一步都被复制。

I know there are monads that should pretty much do what I'm doing here, but since Haskell is pure, doesn't it means that monads still have to copy the state? 我知道有些单子应该做我在这里要做的事情,但是由于Haskell是纯净的,这是否意味着单子仍然必须复制状态?

Shouldn't there be a way to say "Hey, I'm using these values only once in my code and I'm not storing them anywhere. You can just change them instead of making new ones"? 难道没有办法说“嘿,我在代码中只使用了这些值一次,并且没有将它们存储在任何地方。您可以更改它们,而不必创建新的值”?

If Haskell did that by itself it would still allow me to keep the code pure, but make the execution fast. 如果Haskell自己做到这一点,它仍然可以使我保持代码的纯净,但可以使执行速度更快。

Your state is only copied when it's modified - not when it's used. 您的状态仅在修改后才被复制-不会在使用时被复制。

For example, edges :: Vector [Int] is never modified by bfsStep , so the same value is reused throughout all the recursive calls. 例如, edges :: Vector [Int]永远不会被bfsStep修改,因此在所有递归调用中都将重用相同的值。

On the other hand, your queue :: [Int] is modified by bfsStep in two ways: 另一方面,bfsStep可以通过两种方式修改您的queue :: [Int]

  • splitting it into current : queue - but this reuses the tail of the original queue, so no copying is done 将其拆分为current : queue -但这会重用原始队列的尾部,因此不会进行任何复制
  • appending to it with Prelude.++ . Prelude.++附加到它。 This requires O(queue size) copying. 这需要O(queue size)复制。

You have similarly copying required when you update your nodes :: Vector Int to include a new node. 更新nodes :: Vector Int以包含新节点时,您也需要进行类似的复制。

There's a couple ways you could do less copying of your queue and a couple ways to do less copying of your nodes . 有两种方法可以减少queue复制,而有两种方法可以减少nodes复制。

For nodes you could wrap your computation in the ST s monad to use a single modifiable vector. 对于nodes您可以将计算包装在ST s单子中,以使用单个可修改向量。 Alternately you could use a functional data structure like an IntMap which has fairly fast update . 或者,您可以使用像IntMap这样的功能性数据结构,它具有相当快的更新速度

For your queue you could use Data.Sequence, or a two list implementation . 对于您的queue您可以使用Data.Sequence或两个列表的实现

Since the Edges and target never change, I rewrote bfsStep to only return the new Nodes and queue . 由于Edgestarget永远不会改变,因此我重写了bfsStep以仅返回新的Nodesqueue Also I used Data.Vector.modify to do an in-place update of Nodes , instead of the awkward take/drop/cons method that was used previously. 我还使用Data.Vector.modify进行了Nodes的就地更新,而不是以前使用的笨拙的take/drop/cons方法。

Also, bfsStep can be written more succinctly as iterate from the Prelude . 同样,从Prelude iterate ,可以更简洁地编写bfsStep

Now, everything in bfs is O(1) except for the O(n) append on the queue . 现在,除了queue上的O(n)之外, bfs所有内容都是O(1) However, (++) is only O(n) in the length of its first argument, so if the number of edges per vertex is small it will be quite efficient. 但是, (++)在其第一个参数的长度上仅为O(n) ,因此,如果每个顶点的边数很小,它将非常有效。

import Data.Vector (Vector)                                      
import qualified Data.Vector         as V
import qualified Data.Vector.Mutable as M                    

type Nodes = Vector Bool            
type Edges = Vector [Int]

bfs :: Nodes -> Edges -> [Int] -> Int -> (Nodes, [Int])
bfs nodes edges (x:xs) target              
    | x == target = (nodes, [])         
    | nodes V.! x = (nodes, xs)         
    | otherwise   = (marked, edges V.! x ++ xs)
    where marked = V.modify (\v -> M.write v x True) nodes 

bfsSteps :: Nodes -> Edges -> [Int] -> Int -> [(Nodes, [Int])]
bfsSteps nodes edges queue target = 
    iterate (\(n, q) -> bfs n edges q target) (nodes, queue)

You may be interested in reading the first section or two of my Monad Reader article: Lloyd Allison's Corecursive Queues: Why Continuations Matter , which uses self reference to implement an efficient queue. 您可能有兴趣阅读我的Monad Reader文章的第一部分或第二部分: Lloyd Allison的Corecursive Queues:Continuations Matter为什么使用自引用实现有效的队列。 There's also code available on hackage as control-monad-queue . hackage上还有可用的代码,如control-monad-queue In fact I first discovered this trick when implementing a reasonably efficient breadth-first graph reachability algorithm, although I used functional data structures for tracking what the algorithm has already seen. 实际上,尽管我使用功能数据结构来跟踪算法已经看到的内容,但在实现合理有效的广度优先的图形可达性算法时,我首先发现了该技巧。

If you really want to stick with imperative data structures for tracking where you've been, I do recommend the ST monad. 如果您真的想使用命令式数据结构来跟踪您去过的地方,我建议您使用ST monad。 Unfortunately getting ST to work with the type of queue I mentioned above is a bit hacky; 不幸的是,让ST使用我上面提到的队列类型有点麻烦。 I'm not sure I can recommend that combination, although from an FP mindset there's nothing too wrong with that combination. 我不确定我会推荐这种组合,尽管从FP的心态来看,这种组合并没有什么错。

With a more imperative approach, you are probably best off with the traditional two stack queue, or if you really want some extra performance, implementing an imperative queue based on chunks of imperative arrays. 使用更命令式的方法时,最好使用传统的两个堆栈队列,或者如果您确实想要一些额外的性能,则基于命令式数组的块实现命令式队列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM