简体   繁体   English

F#Seq的实现问题

[英]An implementation problem of F# Seq

I am digging into F# source code recently. 我最近正在深入研究F#源代码。

in Seq.fs: 在Seq.fs中:

// Binding. 
//
// We use a type defintion to apply a local dynamic optimization. 
// We automatically right-associate binding, i.e. push the continuations to the right.
// That is, bindG (bindG G1 cont1) cont2 --> bindG G1 (cont1 o cont2)
// This makes constructs such as the following linear rather than quadratic:
//
//  let rec rwalk n = { if n > 0 then 
//                         yield! rwalk (n-1)
//                         yield n }

After seeing the above code, I tested two code: 看到上面的代码后,我测试了两个代码:

let rec rwalk n = seq { if n > 0 then 
                         yield n
                         yield! rwalk (n-1)
                      }

and

let rec rwalk n = seq { if n > 0 then 
                         yield! rwalk (n-1)
                         yield n 
                      }

I found the first one is very fast, while the second is very slow. 我发现第一个非常快,而第二个非常慢。 If n = 10000, it costs 3 seconds on my machine to generate this sequence, thus quadratic time. 如果n = 10000,我的机器上生成此序列需要3秒,因此是二次时间。

The quadratic time is reasonable, as eg 二次时间是合理的,例如

seq { yield! {1; 2; ...; n-1}; yield n } seq { yield! {1; 2; ...; n-1}; yield n } translates to seq { yield! {1; 2; ...; n-1}; yield n }转换为

Seq.append {1; 2; ...; n-1} {n}

This append operation should take linear time, I guess. 我想这个追加操作应该是线性时间。 While in the first code, the append operation is like this: seq { yield n; yield! {n-1; n-2; ...; 1} } 在第一个代码中,append操作是这样的: seq { yield n; yield! {n-1; n-2; ...; 1} } seq { yield n; yield! {n-1; n-2; ...; 1} } seq { yield n; yield! {n-1; n-2; ...; 1} } , which costs constant time. seq { yield n; yield! {n-1; n-2; ...; 1} } ,这需要花费不变的时间。

The the comments in code say that it is linear (maybe this linear is not linear time). 代码中的注释表示它是linear (也许这个线性不是线性时间)。 Maybe this linear relates to using customized implementation for sequence rather than Moand/F# computation expression (as mentioned in F# specification, however the specification does not mention the reason for doing so...). 也许这种linear与使用序列的定制实现而不是Moand / F#计算表达式相关(如F#规范中所述,但是规范没有提到这样做的原因......)。

Could anyone clarify the fuzziness here? 谁能澄清这里的模糊性? Thanks a lot! 非常感谢!

(ps because this is a language design and optimization problem, I also attached Haskell tag to see if people there have insights. ) (ps因为这是一个语言设计和优化问题,我还附上了Haskell标签,看看那里的人是否有见解。)

When yield! yield! appears in a non-tail-call position , it essentiall means the same thing as: 出现在非尾部调用位置 ,它基本上与以下内容相同:

for v in <expr> do yield v

The problem with this (and the reason why is that quadratic) is that for recursive calls, this creates a chain of iterators with nested for loops. 这个问题(以及为什么是二次方的原因)是对于递归调用,这会创建一个带有嵌套for循环的迭代器链。 You need to iterate over the whole sequence generated by <expr> for every single element, so if the iteration is linear, you get a quadratic time (because the linear iteration happens for every element). 您需要迭代<expr>为每个元素生成的整个序列,因此如果迭代是线性的,则得到二次时间(因为线性迭代发生在每个元素上)。

Let's say the rwalk function generates [ 9; 2; 3; 7 ] 假设rwalk函数生成[ 9; 2; 3; 7 ] [ 9; 2; 3; 7 ] [ 9; 2; 3; 7 ] . [ 9; 2; 3; 7 ] In the first iteration, the recursively generated sequence has 4 elements, so you'd iterate over 4 elements and add 1. In the recursive call, you'd iterate over 3 elements and add 1, etc.. Using a diagram, you can see how that's quadratic: 在第一次迭代中,递归生成的序列有4个元素,因此你将迭代4个元素并添加1.在递归调用中,你将迭代3个元素并添加1等。使用图表,你可以看看那是二次的:

x
x x 
x x x
x x x x

Also, each of the recursive calls creates a new instance of object ( IEnumerator ) so there is also some memory cost (although only linear). 此外,每个递归调用都会创建一个新的对象实例( IEnumerator ),因此也会有一些内存成本(尽管只是线性的)。

In a tail-call position , the F# compiler/librar does an optimization. 尾部调用位置 ,F#compiler / librar进行优化。 It "replaces" the current IEnumerable with the one returned by the recursive call, so it doesn't need to iterate overe it to generate all elements - it is simply returned (and this also removes the memory cost). 它将当前IEnumerable “替换”为递归调用返回的IEnumerable ,因此它不需要迭代它以生成所有元素 - 它只是返回(这也消除了内存开销)。

Related. 有关。 The same problem has been discussed in the C# lanaugage design and there is an interesting paper about it (their name for yield! is yield foreach ). 在C#lanaugage设计中已经讨论了同样的问题,并且有一篇关于它有趣论文 (它们的yield!名称是yield foreach )。

I'm not sure what sort of answer you're looking for. 我不确定你在寻找什么样的答案。 As you have noticed, the comment does not match the behavior of the compiler. 您已经注意到,注释与编译器的行为不匹配。 I can't say whether this is an instance of a comment getting out of sync with the implementation or whether it's actually a performance bug (for example, the spec doesn't seem to call out any specific performance requirements). 我不能说这是一个与实现不同步的评论实例,还是它实际上是一个性能错误(例如,规范似乎没有提出任何具体的性能要求)。

However, it should be possible in theory for the compiler's machinery to generate an implementation which operates on your example in linear time. 但是,从理论上讲,编译器的机器应该可以生成一个在线性时间内对您的示例进行操作的实现。 In fact, it's even possible to build such an implementation in a library using computation expressions. 实际上,甚至可以使用计算表达式在库中构建这样的实现。 Here's a rough example, based largely on the paper Tomas cited: 这是一个粗略的例子,主要基于Tomas引用的论文:

open System.Collections
open System.Collections.Generic

type 'a nestedState = 
/// Nothing to yield
| Done 
/// Yield a single value before proceeding
| Val of 'a
/// Yield the results from a nested iterator before proceeding
| Enum of (unit -> 'a nestedState)
/// Yield just the results from a nested iterator
| Tail of (unit -> 'a nestedState)

type nestedSeq<'a>(ntor) =
  let getEnumerator() : IEnumerator<'a> =
    let stack = ref [ntor]
    let curr = ref Unchecked.defaultof<'a>
    let rec moveNext() =
      match !stack with
      | [] -> false
      | e::es as l -> 
          match e() with
          | Done -> stack := es; moveNext()  
          | Val(a) -> curr := a; true
          | Enum(e) -> stack := e :: l; moveNext()
          | Tail(e) -> stack := e :: es; moveNext()
    { new IEnumerator<'a> with
        member x.Current = !curr
      interface System.IDisposable with
        member x.Dispose() = () 
      interface IEnumerator with
        member x.MoveNext() = moveNext()
        member x.Current = box !curr
        member x.Reset() = failwith "Reset not supported" }
  member x.NestedEnumerator = ntor
  interface IEnumerable<'a> with
    member x.GetEnumerator() = getEnumerator()
  interface IEnumerable with
    member x.GetEnumerator() = upcast getEnumerator()

let getNestedEnumerator : 'a seq -> _ = function
| :? ('a nestedSeq) as n -> n.NestedEnumerator
| s -> 
    let e = s.GetEnumerator()
    fun () ->
      if e.MoveNext() then
        Val e.Current
      else
        Done

let states (arr : Lazy<_[]>) = 
  let state = ref -1 
  nestedSeq (fun () -> incr state; arr.Value.[!state]) :> seq<_>

type SeqBuilder() = 
  member s.Yield(x) =  
    states (lazy [| Val x; Done |])
  member s.Combine(x:'a seq, y:'a seq) = 
    states (lazy [| Enum (getNestedEnumerator x); Tail (getNestedEnumerator y) |])
  member s.Zero() =  
    states (lazy [| Done |])
  member s.Delay(f) = 
    states (lazy [| Tail (f() |> getNestedEnumerator) |])
  member s.YieldFrom(x) = x 
  member s.Bind(x:'a seq, f) = 
    let e = x.GetEnumerator() 
    nestedSeq (fun () -> 
                 if e.MoveNext() then  
                   Enum (f e.Current |> getNestedEnumerator) 
                 else  
                   Done) :> seq<_>

let seq = SeqBuilder()

let rec walkr n = seq { 
  if n > 0 then
    return! walkr (n-1)
    return n
}

let rec walkl n = seq {
  if n > 0 then
    return n
    return! walkl (n-1)
}

let time = 
  let watch = System.Diagnostics.Stopwatch.StartNew()
  walkr 10000 |> Seq.iter ignore
  watch.Stop()
  watch.Elapsed

Note that my SeqBuilder is not robust; 请注意,我的SeqBuilder不健壮; it's missing several workflow members and it doesn't do anything regarding object disposal or error handling. 它缺少几个工作流成员,并且它没有做任何有关对象处理或错误处理的事情。 However, it does demonstrate that SequenceBuilder s don't need to exhibit quadratic running time on examples like yours. 但是,它确实证明了SequenceBuilder 不需要在像你这样的例子上展示二次运行时间。

Also note that there's a time-space tradeoff here - the nested iterator for walkr n will iterate through the sequence in O(n) time, but it requires O(n) space to do so. 另请注意,这里存在时间空间权衡 - walkr n的嵌套迭代器将在O(n)时间内遍历序列,但它需要O(n)空间才能执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM