seq的性能 <int> vs Lazy <LazyList<int> >在F＃

Question

There is a well known solution for generating an infinite stream of Hamming numbers (ie all positive integers n where n = 2^i * 3^j * 5^k ). 存在用于生成无限汉明数的流的公知解决方案（即，所有正整数n ，其中n = 2^i * 3^j * 5^k ）。 I have implemented this in two different ways in F#. 我在F＃中以两种不同的方式实现了这一点。 The first method uses seq<int> . 第一种方法使用seq<int> 。 The solution is elegant, but the performance is terrible. 解决方案很优雅，但性能很糟糕。 The second method uses a custom type where the tail is wrapped in Lazy<LazyList<int>> . 第二种方法使用自定义类型，其中尾部包装在Lazy<LazyList<int>> 。 The solution is clunky, but the performance is amazing. 解决方案很笨重，但性能却令人惊叹。

Can someone explain why the performance using seq<int> is so bad and if there is a way to fix it? 有人可以解释为什么使用seq<int>的性能是如此糟糕，如果有办法解决它？ Thanks. 谢谢。

Method 1 using seq<int> . 方法1使用seq<int> 。

// 2-way merge with deduplication
let rec (-|-) (xs: seq<int>) (ys: seq<int>) =
    let x = Seq.head xs
    let y = Seq.head ys
    let xstl = Seq.skip 1 xs
    let ystl = Seq.skip 1 ys
    if x < y then seq { yield x; yield! xstl -|- ys }
    elif x > y then seq { yield y; yield! xs -|- ystl }
    else seq { yield x; yield! xstl -|- ystl }

let rec hamming: seq<int> = seq {
    yield 1
    let xs = Seq.map ((*) 2) hamming
    let ys = Seq.map ((*) 3) hamming
    let zs = Seq.map ((*) 5) hamming
    yield! xs -|- ys -|- zs
}

[<EntryPoint>]
let main argv = 
    Seq.iter (printf "%d, ") <| Seq.take 100 hamming
    0

Method 2 using Lazy<LazyList<int>> . 方法2使用Lazy<LazyList<int>> 。

type LazyList<'a> = Cons of 'a * Lazy<LazyList<'a>>

// Map `f` over an infinite lazy list
let rec inf_map f (Cons(x, g)) = Cons(f x, lazy(inf_map f (g.Force())))

// 2-way merge with deduplication
let rec (-|-) (Cons(x, f) as xs) (Cons(y, g) as ys) =
    if x < y then Cons(x, lazy(f.Force() -|- ys))
    elif x > y then Cons(y, lazy(xs -|- g.Force()))
    else Cons(x, lazy(f.Force() -|- g.Force()))

let rec hamming =
    Cons(1, lazy(let xs = inf_map ((*) 2) hamming
                 let ys = inf_map ((*) 3) hamming
                 let zs = inf_map ((*) 5) hamming
                 xs -|- ys -|- zs))

[<EntryPoint>]
let main args =
    let a = ref hamming
    let i = ref 0
    while !i < 100 do
        match !a with
        | Cons (x, f) ->
            printf "%d, " x
            a := f.Force()
            i := !i + 1
    0

Answer 1

Ganesh is right in that you're evaluating the sequence multiple times. Ganesh是正确的，因为你正在多次评估序列。 Seq.cache will help improve performance, but you get much better performance out of LazyList because the underlying sequence is only ever evaluated once then cached, so it can be traversed much more rapidly. Seq.cache将有助于提高性能，但是您可以从LazyList获得更好的性能，因为基础序列只会被评估一次然后被缓存，因此它可以更快地遍历。 In fact, this is a good example of where LazyList should be used over a normal seq . 实际上，这是一个很好的例子，说明LazyList 应该在普通seq 。

It also looks like there is some significant overhead introduced by your use of Seq.map here. 看起来你在这里使用Seq.map引入了一些重大的开销。 I believe the compiler is allocating a closure each time it's called there. 我相信编译器每次调用时都会分配一个闭包。 I changed your seq based code to use seq -expressions there instead, and it's about 1/3 faster than the original for the first 40 numbers in the sequence: 我将基于seq的代码更改为使用seq -expressions代替，并且它比序列中前40个数字的原始代码快1/3：

let rec hamming: seq<int> = seq {
    yield 1
    let xs = seq { for x in hamming do yield x * 2 }
    let ys = seq { for x in hamming do yield x * 3 }
    let zs = seq { for x in hamming do yield x * 5 }
    yield! xs -|- ys -|- zs
}

My ExtCore library includes a lazyList computation builder which works just like seq , so you can simplify your code like this: 我的ExtCore库包含一个lazyList计算构建器，它就像seq一样工作，因此您可以像这样简化代码：

// 2-way merge with deduplication
let rec (-|-) (xs: LazyList<'T>) (ys: LazyList<'T>) =
    let x = LazyList.head xs
    let y = LazyList.head ys
    let xstl = LazyList.skip 1 xs
    let ystl = LazyList.skip 1 ys
    if x < y then lazyList { yield x; yield! xstl -|- ys }
    elif x > y then lazyList { yield y; yield! xs -|- ystl }
    else lazyList { yield x; yield! xstl -|- ystl }

let rec hamming : LazyList<uint64> = lazyList {
    yield 1UL
    let xs = LazyList.map ((*) 2UL) hamming
    let ys = LazyList.map ((*) 3UL) hamming
    let zs = LazyList.map ((*) 5UL) hamming
    yield! xs -|- ys -|- zs
}

[<EntryPoint>]
let main argv =
    let watch = Stopwatch.StartNew ()

    hamming
    |> LazyList.take 2000
    |> LazyList.iter (printf "%d, ")

    watch.Stop ()
    printfn ""
    printfn "Elapsed time: %.4fms" watch.Elapsed.TotalMilliseconds

    System.Console.ReadKey () |> ignore
    0   // Return an integer exit code

(NOTE: I also made your (-|-) function generic, and modified hamming to use 64-bit unsigned ints because 32-bit signed ints overflow after a bit). （注意：我还使你的(-|-)函数通用，并修改hamming使用64位无符号整数，因为32位有符号整数后溢出。） This code runs through the first 2000 elements of the sequence on my machine in ~450ms; 这段代码在我的机器上运行序列的前2000个元素~45ms; the first 10000 elements takes ~3500ms. 前10000个元素需要~3500ms。

Answer 2

Your seq for hamming is re-evaluated from the beginning on each recursive call. 在每次递归调用时，都会从头开始重新评估hamming seq 。 Seq.cache is some help: Seq.cache是一些帮助：

let rec hamming: seq<int> =
    seq {
        yield 1
        let xs = Seq.map ((*) 2) hamming
        let ys = Seq.map ((*) 3) hamming
        let zs = Seq.map ((*) 5) hamming
        yield! xs -|- ys -|- zs
    } |> Seq.cache

However as you point out the LazyList is still much better on large inputs, even if every single sequence is cached. 但是，正如您指出的那样，即使每个序列都被缓存， LazyList在大输入上仍然要好得多。

I'm not entirely certain why they differ by more than a small constant factor, but perhaps it's better to just focus on making the LazyList less ugly. 我不完全确定为什么它们的区别不仅仅是一个小的常数因子，但也许最好只关注使LazyList不那么难看。 Writing something to convert it to a seq makes processing it much nicer: 写一些东西将其转换为seq会使处理得更好：

module LazyList =
    let rec toSeq l =
        match l with
        | Cons (x, xs) ->
            seq {
                yield x
                yield! toSeq xs.Value
            }

You can then use your simple main directly. 然后，您可以直接使用简单的main 。 It's also not really necessary to use mutation to process the LazyList , you could just do so recursively. 使用变异来处理LazyList也没有必要，你可以递归地这样做。

The definition doesn't look so bad though the lazy and Force() do clutter it up a bit. 虽然lazy和Force()会使它混乱，但定义看起来并不那么糟糕。 That looks marginally better if you use .Value instead of .Force() . 如果使用.Value而不是.Force()那看起来会略微好一点。 You could also define a computation builder for LazyList similar to the seq one to recover the really nice syntax, though I'm not sure it's worth the effort. 您还可以为LazyList定义一个类似于seq的计算构建器来恢复非常好的语法，尽管我不确定这是值得的。

Answer 3

Here is a sequence base version with better performance. 这是一个具有更好性能的序列库版本。

let hamming =
    let rec loop nextHs =
        seq {
            let h = nextHs |> Set.minElement
            yield h
            yield! nextHs 
                |> Set.remove h 
                |> Set.add (h*2) |> Set.add (h*3) |> Set.add (h*5) 
                |> loop
            }

    Set.empty<int> |> Set.add 1 |> loop

seq的性能 <int> vs Lazy <LazyList<int> >在F＃

问题描述

3 个解决方案

解决方案1
8 2014-07-06 21:10:04

解决方案2
3 2014-07-06 20:35:43

解决方案3
0 2014-07-15 14:53:32

seq的性能 <int> vs Lazy <LazyList<int> &gt;在F＃

问题描述

3 个解决方案

解决方案1 8 2014-07-06 21:10:04

解决方案2 3 2014-07-06 20:35:43

解决方案3 0 2014-07-15 14:53:32

seq的性能 <int> vs Lazy <LazyList<int> >在F＃

解决方案1
8 2014-07-06 21:10:04

解决方案2
3 2014-07-06 20:35:43

解决方案3
0 2014-07-15 14:53:32