简体   繁体   English

在Scala中实现'yield'的首选方法是什么?

[英]What is the preferred way to implement 'yield' in Scala?

I am doing writing code for PhD research and starting to use Scala. 我正在为博士研究编写代码并开始使用Scala。 I often have to do text processing. 我经常要做文字处理。 I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. 我已经习惯了Python,其'yield'语句对于在大型(通常是不规则结构化的)文本文件上实现复杂的迭代器非常有用。 Similar constructs exist in other languages (eg C#), for good reason. 类似的结构存在于其他语言(例如C#)中,这是有充分理由的。

Yes I know there have been previous threads on this. 是的我知道之前有过这样的线索。 But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. 但它们看起来像是黑客攻击(或至少解释得很糟糕)的解决方案,这些解决方案并不能很好地运作并且通常具有不明确的局限性。 I would like to write code something like this: 我想编写这样的代码:

import generator._

def yield_values(file:String) = {
  generate {
    for (x <- Source.fromFile(file).getLines()) {
      # Scala is already using the 'yield' keyword.
      give("something")
      for (field <- ":".r.split(x)) {
        if (field contains "/") {
          for (subfield <- "/".r.split(field)) { give(subfield) }
        } else {
          // Scala has no 'continue'.  IMO that should be considered
          // a bug in Scala.
          // Preferred: if (field.startsWith("#")) continue
          // Actual: Need to indent all following code
          if (!field.startsWith("#")) {
            val some_calculation = { ... do some more stuff here ... }
            if (some_calculation && field.startsWith("r")) {
              give("r")
              give(field.slice(1))
            } else {
              // Typically there will be a good deal more code here to handle different cases
              give(field)
            }
          }
        }
      }
    }
  }
}

I'd like to see the code that implements generate() and give(). 我想看看实现generate()和give()的代码。 BTW give() should be named yield() but Scala has taken that keyword already. BTW give()应命名为yield(),但Scala已经使用了该关键字。

I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. 我认为,由于我不理解的原因,Scala延续可能不适用于for语句。 If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop. 如果是这样,generate()应该提供一个尽可能接近for语句的等效函数,因为带有yield的迭代器代码几乎不可避免地位于for循环中。

Please, I would prefer not to get any of the following answers: 请,我不希望得到以下任何答案:

  1. 'yield' sucks, continuations are better. '收益'很糟糕,延续更好。 (Yes, in general you can do more with continuations. But they are hella hard to understand, and 99% of the time an iterator is all you want or need. If Scala provides lots of powerful tools but they're too hard to use in practice, the language won't succeed.) (是的,一般来说你可以用延续来做更多的事情。但是它们很难理解,99%的时候迭代器都是你想要的或者需要的。如果Scala提供了很多强大的工具但它们太难用了在实践中,语言不会成功。)
  2. This is a duplicate. 这是重复的。 (Please see my comments above.) (请参阅上面的评论。)
  3. You should rewrite your code using streams, continuations, recursion, etc. etc. (Please see #1. I will also add, technically you don't need for loops either. For that matter, technically you can do absolutely everything you ever need using SKI combinators .) 您应该使用流,延续,递归等来重写代码。(请参阅#1。我还将添加,从技术上讲,您也不需要循环。就此而言,从技术上讲,您可以完成所有您需要的一切使用SKI组合器 。)
  4. Your function is too long. 你的功能太长了。 Break it up into smaller pieces and you won't need 'yield'. 将其分解成更小的部分,您将不需要“收益”。 You'd have to do this in production code, anyway. 无论如何,你必须在生产代码中这样做。 (First, "you won't need 'yield'" is doubtful in any case. Second, this isn't production code. Third, for text processing like this, very often, breaking the function into smaller pieces -- especially when the language forces you to do this because it lacks the useful constructs -- only makes the code harder to understand.) (首先,“你不需要'收益''在任何情况下都是值得怀疑的。其次,这不是生产代码。第三,对于像这样的文本处理,经常将功能分解成更小的部分 - 特别是当语言迫使你这样做,因为它缺乏有用的结构 - 只会使代码更难理解。)
  5. Rewrite your code with a function passed in. (Technically, yes you can do this. But the result is no longer an iterator, and chaining iterators is much nicer than chaining functions. In general, a language should not force me to write in an unnatural style -- certainly, the Scala creators believe this in general, since they provide shitloads of syntactic sugar.) 使用传入的函数重写代码。(从技术上讲,是的,你可以这样做。但结果不再是迭代器,链接迭代器比链接函数要好得多。一般来说,一种语言不应该强迫我写一个不自然的风格 - 当然,Scala创作者一般都相信这一点,因为它们提供了大量的语法糖。)
  6. Rewrite your code in this, that, or the other way, or some other cool, awesome way I just thought of. 用这个,那个或者其他方式重写你的代码,或者我想到的其他一些很酷的,令人敬畏的方式。

The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. 你的问题的前提似乎是你想要Python的收益率,并且你不希望任何其他合理的建议在Scala中以不同的方式做同样的事情。 If this is true, and it is that important to you, why not use Python? 如果这是真的,那对你来说很重要,为什么不使用Python呢? It's quite a nice language. 这是一个很好的语言。 Unless your Ph.D. 除非你的博士学位 is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead? 是在计算机科学,使用Scala是你论文的重要部分,如果你已经熟悉Python并且非常喜欢它的一些功能和设计选择,为什么不使用它呢?

Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. 无论如何,如果你真的想学习如何在Scala中解决你的问题,事实证明,对于你所拥有的代码,分隔的延续是过度的。 All you need are flatMapped iterators. 您只需要flatMapped迭代器。

Here's how you do it. 这是你如何做到的。

// You want to write
for (x <- xs) { /* complex yield in here */ }
// Instead you write
xs.iterator.flatMap { /* Produce iterators in here */ }

// You want to write
yield(a)
yield(b)
// Instead you write
Iterator(a,b)

// You want to write
yield(a)
/* complex set of yields in here */
// Instead you write
Iterator(a) ++ /* produce complex iterator here */

That's it! 而已! All your cases can be reduced to one of these three. 您的所有案例都可以减少到这三个案例中的一个。

In your case, your example would look something like 在你的情况下,你的例子看起来像

Source.fromFile(file).getLines().flatMap(x =>
  Iterator("something") ++
  ":".r.split(x).iterator.flatMap(field =>
    if (field contains "/") "/".r.split(field).iterator
    else {
      if (!field.startsWith("#")) {
        /* vals, whatever */
        if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))
        else Iterator(field)
      }
      else Iterator.empty
    }
  )
)

PS Scala does have continue; PS Scala 确实有继续; it's done like so (implemented by throwing stackless (light-weight) exceptions): 它是这样完成的(通过抛出无堆栈(轻量级)异常实现):

import scala.util.control.Breaks._
for (blah) { breakable { ... break ... } }

but that won't get you what you want because Scala doesn't have the yield you want. 但这不会得到你想要的东西,因为Scala没有你想要的产量。

'yield' sucks, continuations are better '收益'很糟糕,延续更好

Actually, Python's yield is a continuation. 实际上,Python的yield 一个延续。

What is a continuation? 什么是延续? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. 延续是将当前执行点与其所有状态保存在一起,以便稍后可以继续执行。 That's precisely what Python's yield , and, also, precisely how it is implemented. 这正是Python的yield ,以及它的实现方式。

It is my understanding that Python's continuations are not delimited , however. 但是我的理解是,Python的延续不是分隔的 I don't know much about that -- I might be wrong, in fact. 我对此并不了解 - 实际上我可能错了。 Nor do I know what the implications of that may be. 我也不知道这可能是什么影响。

Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have. Scala的延续在运行时不起作用 - 实际上,有一个Java的延续库,它通过在运行时对字节码进行操作来完成,这不受Scala延续的约束。

Scala's continuation are entirely done at compile time, which require quite a bit of work. Scala的延续完全在编译时完成,这需要相当多的工作。 It also requires that the code that will be "continued" be prepared by the compiler to do so. 它还要求编译器准备“继续”的代码。

And that's why for-comprehensions do not work. 这就是为什么理解不起作用的原因。 A statement like this: 这样的陈述:

for { x <- xs } proc(x)

If translated into 如果翻译成

xs.foreach(x => proc(x))

Where foreach is a method on xs 's class. foreachxs类的一种方法。 Unfortunately, xs class has been long compiled, so it cannot be modified into supporting the continuation. 不幸的是, xs类已被长期编译,因此无法修改为支持延续。 As a side note, that's also why Scala doesn't have continue . 作为旁注,这也是Scala没有continue

Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code. 除此之外,是的,这是一个重复的问题,是的,您应该找到一种不同的方式来编写代码。

The implementation below provides a Python-like generator. 下面的实现提供了类似Python的生成器。

Notice that there's a function called _yield in the code below, because yield is already a keyword in Scala, which by the way, does not have anything to do with yield you know from Python. 请注意,下面的代码中有一个名为_yield的函数,因为yield已经是Scala中的一个关键字,顺便说一句,它与您从Python中yield没有任何关系。

import scala.annotation.tailrec
import scala.collection.immutable.Stream
import scala.util.continuations._

object Generators {
  sealed trait Trampoline[+T]

  case object Done extends Trampoline[Nothing]
  case class Continue[T](result: T, next: Unit => Trampoline[T]) extends Trampoline[T]

  class Generator[T](var cont: Unit => Trampoline[T]) extends Iterator[T] {
    def next: T = {
      cont() match {
        case Continue(r, nextCont) => cont = nextCont; r
        case _ => sys.error("Generator exhausted")
      }
    }

    def hasNext = cont() != Done
  }

  type Gen[T] = cps[Trampoline[T]]

  def generator[T](body: => Unit @Gen[T]): Generator[T] = {
    new Generator((Unit) => reset { body; Done })
  }

  def _yield[T](t: T): Unit @Gen[T] =
    shift { (cont: Unit => Trampoline[T]) => Continue(t, cont) }
}


object TestCase {
  import Generators._

  def sectors = generator {
    def tailrec(seq: Seq[String]): Unit @Gen[String] = {
      if (!seq.isEmpty) {
        _yield(seq.head)
        tailrec(seq.tail)
      }
    }

    val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
    tailrec(list)
  }

  def main(args: Array[String]): Unit = {
    for (s <- sectors) { println(s) }
  }
}

It works pretty well, including for the typical usage of for loops. 它工作得很好,包括for循环的典型用法。

Caveat: we need to remember that Python and Scala differ in the way continuations are implemented. 警告:我们需要记住Python和Scala在实现continuation的方式上有所不同。 Below we see how generators are typically used in Python and compare to the way we have to use them in Scala. 下面我们看看如何在Python中使用生成器,并与我们在Scala中使用它们的方式进行比较。 Then, we will see why it needs to be like so in Scala. 然后,我们将看到为什么它需要在Scala中如此。

If you are used to writing code in Python, you've probably used generators like this: 如果你习惯用Python编写代码,你可能会使用这样的生成器:

// This is Scala code that does not compile :(
// This code naively tries to mimic the way generators are used in Python

def myGenerator = generator {
  val list: Seq[String] = List("Financials", "Materials", "Technology", "Utilities")
  list foreach {s => _yield(s)}
}

This code above does not compile. 上面的代码不能编译。 Skipping all convoluted theoretical aspects, the explanation is: it fails to compile because "the type of the for loop" does not match the type involved as part of the continuation. 跳过所有复杂的理论方面,解释是:它无法编译,因为“for循环的类型”与作为延续的一部分所涉及的类型不匹配。 I'm afraid this explanation is a complete failure. 我担心这个解释完全失败了。 Let me try again: 让我再尝试一次:

If you had coded something like shown below, it would compile fine: 如果您编写了如下所示的代码,它将编译正常:

def myGenerator = generator {
  _yield("Financials")
  _yield("Materials")
  _yield("Technology")
  _yield("Utilities")
}

This code compiles because the generator can be decomposed in a sequence of yield s and, in this case, a yield matches the type involved in the continuation. 此代码编译是因为生成器可以按照yield s的顺序进行分解 ,在这种情况下, yield与continuation中涉及的类型匹配。 To be more precise, the code can be decomposed onto chained blocks, where each block ends with a yield . 更准确地说,代码可以分解为链接块,其中每个块以yield结束。 Just for the sake of clarification, we can think that the sequence of yield s could be expressed like this: 只是为了澄清,我们可以认为yield的顺序可以这样表达:

{ some code here; _yield("Financials")
    { some other code here; _yield("Materials")
        { eventually even some more code here; _yield("Technology")
            { ok, fine, youve got the idea, right?; _yield("Utilities") }}}}

Again, without going deep into convoluted theory, the point is that, after a yield you need to provide another block that ends with a yield , or close the chain otherwise. 同样,在没有深入研究复杂理论的情况下,重点是,在yield您需要提供以yield结束的另一个块,否则关闭链。 This is what we are doing in the pseudo-code above: after the yield we are opening another block which in turn ends with a yield followed by another yield which in turn ends with another yield , and so on. 这就是我们在上面的伪代码中所做的事情:在yield我们打开另一个块,而这个块又以yield结束,然后是另一个yield yield ,而另一个yield又以另一个yield结束,依此类推。 Obviously this thing must end at some point. 显然这件事必须在某个时候结束。 Then the only thing we are allowed to do is closing the entire chain. 那么我们唯一允许做的就是关闭整个链条。

OK. 好。 But... how we can yield multiple pieces of information? 但是......我们如何才能yield多条信息? The answer is a little obscure but makes a lot of sense after you know the answer: we need to employ tail recursion, and the the last statement of a block must be a yield . 答案有点模糊,但在你知道答案后很有意义:我们需要使用尾递归,并且块的最后一个语句必须是一个yield

  def myGenerator = generator {
    def tailrec(seq: Seq[String]): Unit @Gen[String] = {
      if (!seq.isEmpty) {
        _yield(seq.head)
        tailrec(seq.tail)
      }
    }

    val list = List("Financials", "Materials", "Technology", "Utilities")
    tailrec(list)
  }

Let's analyze what's going on here: 让我们来分析一下这里发生了什么:

  1. Our generator function myGenerator contains some logic that obtains that generates information. 我们的生成器函数myGenerator包含一些获取生成信息的逻辑。 In this example, we simply use a sequence of strings. 在这个例子中,我们只使用一系列字符串。

  2. Our generator function myGenerator calls a recursive function which is responsible for yield -ing multiple pieces of information, obtained from our sequence of strings. 我们的发电机功能myGenerator调用递归函数负责yield -ing多条信息,从我们的字符串的序列获得。

  3. The recursive function must be declared before use , otherwise the compiler crashes. 必须在使用前声明递归函数,否则编译器崩溃。

  4. The recursive function tailrec provides the tail recursion we need. 递归函数tailrec提供了我们需要的尾递归。

The rule of thumb here is simple: substitute a for loop with a recursive function, as demonstrated above. 这里的经验法则很简单:用递归函数替换for循环,如上所示。

Notice that tailrec is just a convenient name we found, for the sake of clarification. 请注意,为了澄清, tailrec只是我们找到的一个方便的名称。 In particular, tailrec does not need to be the last statement of our generator function; 特别是, tailrec不需要是我们的生成器函数的最后一个语句; not necessarily. 不必要。 The only restriction is that you have to provide a sequence of blocks which match the type of an yield , like shown below: 唯一的限制是您必须提供与yield类型匹配的块序列,如下所示:

  def myGenerator = generator {

    def tailrec(seq: Seq[String]): Unit @Gen[String] = {
      if (!seq.isEmpty) {
        _yield(seq.head)
        tailrec(seq.tail)
      }
    }

    _yield("Before the first call")
    _yield("OK... not yet...")
    _yield("Ready... steady... go")

    val list = List("Financials", "Materials", "Technology", "Utilities")
    tailrec(list)

    _yield("done")
    _yield("long life and prosperity")
  }

One step further, you must be imagining how real life applications look like, in particular if you are employing several generators. 更进一步,您必须想象真实生活应用程序的外观,特别是如果您使用多个生成器。 It would be a good idea if you find a way to standardize your generators around a single pattern that demonstrates to be convenient for most circumstances. 如果您找到一种方法来围绕单个模式标准化您的发电机,这对于大多数情况来说是方便的,那将是一个好主意。

Let's examine the example below. 我们来看看下面的例子。 We have three generators: sectors , industries and companies . 我们有三个发电机: sectorsindustriescompanies For brevity, only sectors is completely shown. 为简洁起见,仅显示sectors This generator employs a tailrec function as demonstrated already above. 该发生器采用了tailrec功能。 The trick here is that the same tailrec function is also employed by other generators. 这里的技巧是其他发生器也使用相同的tailrec函数。 All we have to do is supply a different body function. 我们所要做的就是提供不同的body功能。

type GenP = (NodeSeq, NodeSeq, NodeSeq)
type GenR = immutable.Map[String, String]

def tailrec(p: GenP)(body: GenP => GenR): Unit @Gen[GenR] = {
  val (stats, rows, header)  = p
  if (!stats.isEmpty && !rows.isEmpty) {
    val heads: GenP = (stats.head, rows.head, header)
    val tails: GenP = (stats.tail, rows.tail, header)
    _yield(body(heads))
    // tail recursion
    tailrec(tails)(body)
  }
}

def sectors = generator[GenR] {
  def body(p: GenP): GenR = {
      // unpack arguments
      val stat, row, header = p
      // obtain name and url
      val name = (row \ "a").text
      val url  = (row \ "a" \ "@href").text
      // create map and populate fields: name and url
      var m = new scala.collection.mutable.HashMap[String, String]
      m.put("name", name)
      m.put("url",  url)
      // populate other fields
      (header, stat).zipped.foreach { (k, v) => m.put(k.text, v.text) }
      // returns a map
      m
  }

  val root  : scala.xml.NodeSeq = cache.loadHTML5(urlSectors) // obtain entire page
  val header: scala.xml.NodeSeq = ... // code is omitted
  val stats : scala.xml.NodeSeq = ... // code is omitted
  val rows  : scala.xml.NodeSeq = ... // code is omitted
  // tail recursion
  tailrec((stats, rows, header))(body)
} 

def industries(sector: String) = generator[GenR] {
  def body(p: GenP): GenR = {
      //++ similar to 'body' demonstrated in "sectors"
      // returns a map
      m
  }

  //++ obtain NodeSeq variables, like demonstrated in "sectors" 
  // tail recursion
  tailrec((stats, rows, header))(body)
} 

def companies(sector: String) = generator[GenR] {
  def body(p: GenP): GenR = {
      //++ similar to 'body' demonstrated in "sectors"
      // returns a map
      m
  }

  //++ obtain NodeSeq variables, like demonstrated in "sectors" 
  // tail recursion
  tailrec((stats, rows, header))(body)
} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM