简体   繁体   中英

Stack overflow when constructing/evaluating a red black tree in Haskell

I have the following Red Black tree:

data Tree a
  = E
  | S a
  | C !Color !(Tree a) !(Tree a)

data Color = R | B

In case of this tree, all the data are stored in the leaves (the S constructor). I have written an insert function like the standard Okasaki red black trees[1] (modifying the parts where the values are stored in the internal nodes)

In this cases I populate the tree with 10 million elements:

l = go 10000000 E
  where
    go 0 t = insert 0 t
    go n t = insert t $ go (n - 1) t

When I try to evaluate the left most element (leaf) of the tree like this:

left :: Tree a -> Maybe a
left E = Nothing
left (S x) = Just x
left (C _ _ l _) = left l

I encounter the following:

left l

*** Exception: stack overflow

Is this owing to the way that I am constructing the tree (non tail recursive) or is there some missing space leak that I cannot see.

Please note the function works fine for a million elements. Additionally I attempted a tail recursive way of the tree construction:

l = go 10000000 E
  where
    go 0 t = insert 0 t
    go n t = go (n - 1) (insert n t)

but encountered the same stack overflow exception.

[1] https://www.cs.tufts.edu/~nr/cs257/archive/chris-okasaki/redblack99.pdf

EDIT

The insert and balance function for completeness:

 insert :: Ord a => a -> Tree a -> Tree a
 insert x xs = makeBlack $ ins xs
   where
     ins E = S x
     ins (S a) = C R (S x) (S a)
     ins (C c l r) = balance c (ins l) r -- always traverse left and trust the balancing

     makeBlack (C _ l r) = C B l r
     makeBlack a = a

 balance :: Color -> Tree a -> Tree a -> Tree a
 balance B (C R (C R a b) c) d = C R (C B a b) (C B c d)
 balance B (C R a (C R b c)) d = C R (C B a b) (C B c d)
 balance B a (C R (C R b c) d) = C R (C B a b) (C B c d)
 balance B a (C R b (C R c d)) = C R (C B a b) (C B c d)
 balance color a b = C color a b

There was mistyping from my end while typing in the insert code, it is insert n $ go (n - 1) t and not insert t $ go (n - 1) t . However when actually encountering the stack overflow the code was correct and the overflow happened in ghci.

The first example of insertion code has a bug: it tries to insert the tree itself as an element.

The second version

l = go 10000000 L.empty   where
    go 0 t = L.cons 0 t
    go n t = go (n - 1) (L.cons n t)

Is indeed tail recursive, but it still has a problem: it doesn't at any step "force" the tree while it is being constructed. Due to Haskell's laziness, go will return a thunk that hides 10000000 pending applications of L.cons .

When the runtime tries to "pop" that thunk, it will put each n variable in the stack while the thunk below is being "popped" in its turn, causing the stack overflow. "Function calls don't add stack frames in Haskell; instead, stack frames come from nesting thunks."

The solution is to force each intermediate tree to WHNF, so that thunks don't accumulate. This should be enough (using the BangPatterns extension):

l :: Tree Int 
l = go 10000000 L.empty
  where
    go 0 !t = L.cons 0 t
    go n !t = go (n - 1) (L.cons n t)

This basically means: "before recursing to add another element, make sure the accumulator is in WHNF". The n need not be forced because it is scrutinized in the pattern-match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM