简体   繁体   中英

An implementation problem of F# Seq

I am digging into F# source code recently.

in Seq.fs:

// Binding. 
//
// We use a type defintion to apply a local dynamic optimization. 
// We automatically right-associate binding, i.e. push the continuations to the right.
// That is, bindG (bindG G1 cont1) cont2 --> bindG G1 (cont1 o cont2)
// This makes constructs such as the following linear rather than quadratic:
//
//  let rec rwalk n = { if n > 0 then 
//                         yield! rwalk (n-1)
//                         yield n }

After seeing the above code, I tested two code:

let rec rwalk n = seq { if n > 0 then 
                         yield n
                         yield! rwalk (n-1)
                      }

and

let rec rwalk n = seq { if n > 0 then 
                         yield! rwalk (n-1)
                         yield n 
                      }

I found the first one is very fast, while the second is very slow. If n = 10000, it costs 3 seconds on my machine to generate this sequence, thus quadratic time.

The quadratic time is reasonable, as eg

seq { yield! {1; 2; ...; n-1}; yield n } seq { yield! {1; 2; ...; n-1}; yield n } translates to

Seq.append {1; 2; ...; n-1} {n}

This append operation should take linear time, I guess. While in the first code, the append operation is like this: seq { yield n; yield! {n-1; n-2; ...; 1} } seq { yield n; yield! {n-1; n-2; ...; 1} } seq { yield n; yield! {n-1; n-2; ...; 1} } , which costs constant time.

The the comments in code say that it is linear (maybe this linear is not linear time). Maybe this linear relates to using customized implementation for sequence rather than Moand/F# computation expression (as mentioned in F# specification, however the specification does not mention the reason for doing so...).

Could anyone clarify the fuzziness here? Thanks a lot!

(ps because this is a language design and optimization problem, I also attached Haskell tag to see if people there have insights. )

When yield! appears in a non-tail-call position , it essentiall means the same thing as:

for v in <expr> do yield v

The problem with this (and the reason why is that quadratic) is that for recursive calls, this creates a chain of iterators with nested for loops. You need to iterate over the whole sequence generated by <expr> for every single element, so if the iteration is linear, you get a quadratic time (because the linear iteration happens for every element).

Let's say the rwalk function generates [ 9; 2; 3; 7 ] [ 9; 2; 3; 7 ] [ 9; 2; 3; 7 ] . In the first iteration, the recursively generated sequence has 4 elements, so you'd iterate over 4 elements and add 1. In the recursive call, you'd iterate over 3 elements and add 1, etc.. Using a diagram, you can see how that's quadratic:

x
x x 
x x x
x x x x

Also, each of the recursive calls creates a new instance of object ( IEnumerator ) so there is also some memory cost (although only linear).

In a tail-call position , the F# compiler/librar does an optimization. It "replaces" the current IEnumerable with the one returned by the recursive call, so it doesn't need to iterate overe it to generate all elements - it is simply returned (and this also removes the memory cost).

Related. The same problem has been discussed in the C# lanaugage design and there is an interesting paper about it (their name for yield! is yield foreach ).

I'm not sure what sort of answer you're looking for. As you have noticed, the comment does not match the behavior of the compiler. I can't say whether this is an instance of a comment getting out of sync with the implementation or whether it's actually a performance bug (for example, the spec doesn't seem to call out any specific performance requirements).

However, it should be possible in theory for the compiler's machinery to generate an implementation which operates on your example in linear time. In fact, it's even possible to build such an implementation in a library using computation expressions. Here's a rough example, based largely on the paper Tomas cited:

open System.Collections
open System.Collections.Generic

type 'a nestedState = 
/// Nothing to yield
| Done 
/// Yield a single value before proceeding
| Val of 'a
/// Yield the results from a nested iterator before proceeding
| Enum of (unit -> 'a nestedState)
/// Yield just the results from a nested iterator
| Tail of (unit -> 'a nestedState)

type nestedSeq<'a>(ntor) =
  let getEnumerator() : IEnumerator<'a> =
    let stack = ref [ntor]
    let curr = ref Unchecked.defaultof<'a>
    let rec moveNext() =
      match !stack with
      | [] -> false
      | e::es as l -> 
          match e() with
          | Done -> stack := es; moveNext()  
          | Val(a) -> curr := a; true
          | Enum(e) -> stack := e :: l; moveNext()
          | Tail(e) -> stack := e :: es; moveNext()
    { new IEnumerator<'a> with
        member x.Current = !curr
      interface System.IDisposable with
        member x.Dispose() = () 
      interface IEnumerator with
        member x.MoveNext() = moveNext()
        member x.Current = box !curr
        member x.Reset() = failwith "Reset not supported" }
  member x.NestedEnumerator = ntor
  interface IEnumerable<'a> with
    member x.GetEnumerator() = getEnumerator()
  interface IEnumerable with
    member x.GetEnumerator() = upcast getEnumerator()

let getNestedEnumerator : 'a seq -> _ = function
| :? ('a nestedSeq) as n -> n.NestedEnumerator
| s -> 
    let e = s.GetEnumerator()
    fun () ->
      if e.MoveNext() then
        Val e.Current
      else
        Done

let states (arr : Lazy<_[]>) = 
  let state = ref -1 
  nestedSeq (fun () -> incr state; arr.Value.[!state]) :> seq<_>

type SeqBuilder() = 
  member s.Yield(x) =  
    states (lazy [| Val x; Done |])
  member s.Combine(x:'a seq, y:'a seq) = 
    states (lazy [| Enum (getNestedEnumerator x); Tail (getNestedEnumerator y) |])
  member s.Zero() =  
    states (lazy [| Done |])
  member s.Delay(f) = 
    states (lazy [| Tail (f() |> getNestedEnumerator) |])
  member s.YieldFrom(x) = x 
  member s.Bind(x:'a seq, f) = 
    let e = x.GetEnumerator() 
    nestedSeq (fun () -> 
                 if e.MoveNext() then  
                   Enum (f e.Current |> getNestedEnumerator) 
                 else  
                   Done) :> seq<_>

let seq = SeqBuilder()

let rec walkr n = seq { 
  if n > 0 then
    return! walkr (n-1)
    return n
}

let rec walkl n = seq {
  if n > 0 then
    return n
    return! walkl (n-1)
}

let time = 
  let watch = System.Diagnostics.Stopwatch.StartNew()
  walkr 10000 |> Seq.iter ignore
  watch.Stop()
  watch.Elapsed

Note that my SeqBuilder is not robust; it's missing several workflow members and it doesn't do anything regarding object disposal or error handling. However, it does demonstrate that SequenceBuilder s don't need to exhibit quadratic running time on examples like yours.

Also note that there's a time-space tradeoff here - the nested iterator for walkr n will iterate through the sequence in O(n) time, but it requires O(n) space to do so.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM