I have the following Red Black tree:
data Tree a
= E
| S a
| C !Color !(Tree a) !(Tree a)
data Color = R | B
In case of this tree, all the data are stored in the leaves (the S constructor). I have written an insert
function like the standard Okasaki red black trees[1] (modifying the parts where the values are stored in the internal nodes)
In this cases I populate the tree with 10 million elements:
l = go 10000000 E
where
go 0 t = insert 0 t
go n t = insert t $ go (n - 1) t
When I try to evaluate the left most element (leaf) of the tree like this:
left :: Tree a -> Maybe a
left E = Nothing
left (S x) = Just x
left (C _ _ l _) = left l
I encounter the following:
left l
*** Exception: stack overflow
Is this owing to the way that I am constructing the tree (non tail recursive) or is there some missing space leak that I cannot see.
Please note the function works fine for a million elements. Additionally I attempted a tail recursive way of the tree construction:
l = go 10000000 E
where
go 0 t = insert 0 t
go n t = go (n - 1) (insert n t)
but encountered the same stack overflow exception.
[1] https://www.cs.tufts.edu/~nr/cs257/archive/chris-okasaki/redblack99.pdf
EDIT
The insert and balance function for completeness:
insert :: Ord a => a -> Tree a -> Tree a
insert x xs = makeBlack $ ins xs
where
ins E = S x
ins (S a) = C R (S x) (S a)
ins (C c l r) = balance c (ins l) r -- always traverse left and trust the balancing
makeBlack (C _ l r) = C B l r
makeBlack a = a
balance :: Color -> Tree a -> Tree a -> Tree a
balance B (C R (C R a b) c) d = C R (C B a b) (C B c d)
balance B (C R a (C R b c)) d = C R (C B a b) (C B c d)
balance B a (C R (C R b c) d) = C R (C B a b) (C B c d)
balance B a (C R b (C R c d)) = C R (C B a b) (C B c d)
balance color a b = C color a b
There was mistyping from my end while typing in the insert code, it is insert n $ go (n - 1) t
and not insert t $ go (n - 1) t
. However when actually encountering the stack overflow the code was correct and the overflow happened in ghci.
The first example of insertion code has a bug: it tries to insert the tree itself as an element.
The second version
l = go 10000000 L.empty where
go 0 t = L.cons 0 t
go n t = go (n - 1) (L.cons n t)
Is indeed tail recursive, but it still has a problem: it doesn't at any step "force" the tree while it is being constructed. Due to Haskell's laziness, go
will return a thunk that hides 10000000 pending applications of L.cons
.
When the runtime tries to "pop" that thunk, it will put each n
variable in the stack while the thunk below is being "popped" in its turn, causing the stack overflow. "Function calls don't add stack frames in Haskell; instead, stack frames come from nesting thunks."
The solution is to force each intermediate tree to WHNF, so that thunks don't accumulate. This should be enough (using the BangPatterns
extension):
l :: Tree Int
l = go 10000000 L.empty
where
go 0 !t = L.cons 0 t
go n !t = go (n - 1) (L.cons n t)
This basically means: "before recursing to add another element, make sure the accumulator is in WHNF". The n
need not be forced because it is scrutinized in the pattern-match.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.