简体   繁体   中英

Performance gain implementing concatMap with foldl' for finite list?

I read from Foldr Foldl Foldl' that foldl' is more efficient for long finite lists because of the strictness property. I am aware that it is not suitable for infinite list.

Thus, I am limiting the comparison only for long finite lists.


concatMap

concatMap is implemented using foldr , which gives it laziness. However, using it with long finite lists will build up a long unreduced chain according to the article.

concatMap :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap f xs = build (\c n -> foldr (\x b -> foldr c b (f x)) n xs)

Thus I come up with the following implementation with use of foldl' .

concatMap' :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap' f = reverse . foldl' (\acc x -> f x ++ acc) []

Test it out

I have build the following two functions to test out the performance.

lastA = last . concatMap (: []) $ [1..10000]
lastB = last . concatMap' (: []) $ [1..10000]

However, I was shocked by the results.

lastA:
(0.23 secs, 184,071,944 bytes)
(0.24 secs, 184,074,376 bytes)
(0.24 secs, 184,071,048 bytes)
(0.24 secs, 184,074,376 bytes)
(0.25 secs, 184,075,216 bytes)

lastB:   
(0.81 secs, 224,075,080 bytes)
(0.76 secs, 224,074,504 bytes)
(0.78 secs, 224,072,888 bytes)
(0.84 secs, 224,073,736 bytes)
(0.79 secs, 224,074,064 bytes)

Follow-up Questions

concatMap outcompetes my concatMap' in both time and memory. I wonder there are mistakes I made in my concatMap' implementation.

Thus, I doubt the articles for stating the goodness of foldl' .

  1. Are there any black magic in concatMap to make it so efficient?

  2. Is it true that foldl' is more efficient for long finite list?

  3. Is it true that using foldr with long finite lists will build up a long unreduced chain and impact the performance?

Are there any black magic in concatMap to make it so efficient?

No, not really.

Is it true that foldl' is more efficient for long finite list?

Not always. It depends on the folding function.

The point is, foldl and foldl' always have to scan the whole input list before producing the output. Instead, foldr does not always have to.

As an extreme case, consider

foldr (\x xs -> x) 0 [10..10000000]

which evaluates to 10 instantly -- only the first element of the list is evaluated. The reduction goes something like

foldr (\x xs -> x) 0 [10..10000000]
= foldr (\x xs -> x) 0 (10 : [11..10000000])
= (\x xs -> x) 10 (foldr (\x xs -> x) 0 [11..10000000])
= (\xs -> 10) (foldr (\x xs -> x) 0 [11..10000000])
= 10

and the recursive call is not evaluated thanks to laziness.

In general, when computing foldr fa xs , it is important to check whether fy ys is able to construct a part of the output before evaluating ys . For instance

foldr f [] xs
where f y ys = (2*y) : ys

produces a list cell _ : _ before evaluating 2*y and ys . This makes it an excellent candidate for foldr .

Again, we can define

map f xs = foldr (\y ys -> f y : ys) [] xs

which runs just fine. It consumes one element from xs and outputs the first output cell. Then it consumes the next element, outputs the next element, and so on. Using foldl' would not output anything until the whole list is processed, making the code quite inefficient.

Instead, if we wrote

sum xs = foldr (\y ys -> y+ys) 0 xs

then we do not output anything after the first element of xs is consumed. We build a long chain of thunks, wasting a lot of memory. Here, foldl' would instead work in constant space.

Is it true that using foldr with long finite lists will build up a long unreduced chain and impact the performance?

Not always. It strongly depends on how the output is consumed by the caller.

As a thumb rule, if the output is "atomic", meaning that the output consumer can not observe only a part of it (eg Bool, Int, ... ) then it's better to use foldl' . If the output is "composed" of many independent values (list, trees, ...) probably foldr is a better choice, if f can produce its output step-by-step, in a "streaming" fashion.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM