简体   繁体   English

Scala 中的模式匹配是如何在字节码级别实现的?

[英]How is pattern matching in Scala implemented at the bytecode level?

How is pattern matching in Scala implemented at the bytecode level? Scala 中的模式匹配是如何在字节码级别实现的?

Is it like a series of if (x instanceof Foo) constructs, or something else?它像一系列if (x instanceof Foo)构造,还是其他什么? What are its performance implications?它的性能影响是什么?

For example, given the following code (from Scala By Example pages 46-48), how would the equivalent Java code for the eval method look like?例如,给定以下代码(来自Scala By Example第 46-48 页), eval方法的等效 Java 代码是什么样的?

abstract class Expr
case class Number(n: Int) extends Expr
case class Sum(e1: Expr, e2: Expr) extends Expr

def eval(e: Expr): Int = e match {
  case Number(x) => x
  case Sum(l, r) => eval(l) + eval(r)
}

PS I can read Java bytecode, so a bytecode representation would be good enough for me, but probably it would be better for the other readers to know how it would look like as Java code. PS 我可以阅读 Java 字节码,所以字节码表示对我来说已经足够了,但对于其他读者来说,了解它作为 Java 代码的样子可能会更好。

PPS Does the book Programming in Scala give an answer to this and similar questions about how Scala is implemented? PPS Scala 编程这本书是否回答了这个问题以及关于 Scala 是如何实现的类似问题? I have ordered the book, but it has not yet arrived.我已经订购了这本书,但还没有到。

The low level can be explored with a disassembler but the short answer is that it's a bunch of if/elses where the predicate depends on the pattern可以使用反汇编器探索低级别,但简短的回答是它是一堆 if/else,其中谓词取决于模式

case Sum(l,r) // instance of check followed by fetching the two arguments and assigning to two variables l and r but see below about custom extractors 
case "hello" // equality check
case _ : Foo // instance of check
case x => // assignment to a fresh variable
case _ => // do nothing, this is the tail else on the if/else

There's much more that you can do with patterns like or patterns and combinations like "case Foo(45, x)", but generally those are just logical extensions of what I just described.你可以用像“case Foo(45, x)”这样的模式或模式和组合做更多的事情,但通常这些只是我刚刚描述的逻辑扩展。 Patterns can also have guards, which are additional constraints on the predicates.模式也可以有守卫,这是对谓词的额外约束。 There are also cases where the compiler can optimize pattern matching, eg when there's some overlap between cases it might coalesce things a bit.也有编译器可以优化模式匹配的情况,例如,当情况之间有一些重叠时,它可能会合并一些东西。 Advanced patterns and optimization are an active area of work in the compiler, so don't be surprised if the byte code improves substantially over these basic rules in current and future versions of Scala.高级模式和优化是编译器中的一个活跃工作领域,因此如果字节码在当前和未来版本的 Scala 中比这些基本规则有显着改进,请不要感到惊讶。

In addition to all that, you can write your own custom extractors in addition to or instead of the default ones Scala uses for case classes.除此之外,您可以编写自己的自定义提取器,以补充或代替 Scala 用于案例类的默认提取器。 If you do, then the cost of the pattern match is the cost of whatever the extractor does.如果这样做,那么模式匹配的成本就是提取器所做的任何事情的成本。 A good overview is found in http://lamp.epfl.ch/~emir/written/MatchingObjectsWithPatterns-TR.pdfhttp://lamp.epfl.ch/~emir/written/MatchingObjectsWithPatterns-TR.pdf 中有一个很好的概述

James (above) said it best.詹姆斯(上)说得最好。 However, if you're curious it's always a good exercise to look at the disassembled bytecode.但是,如果您很好奇,查看反汇编的字节码总是一个很好的练习。 You can also invoke scalac with the -print option, which will print your program with all Scala-specific features removed.您还可以使用-print选项调用scalac ,该选项将在删除所有 Scala 特定功能的情况下打印您的程序。 It's basically Java in Scala's clothing.它基本上是披着 Scala 外衣的 Java。 Here's the relevant scalac -print output for the code snippet you gave:这是您提供的代码片段的相关scalac -print输出:

def eval(e: Expr): Int = {
  <synthetic> val temp10: Expr = e;
  if (temp10.$isInstanceOf[Number]())
    temp10.$asInstanceOf[Number]().n()
  else
    if (temp10.$isInstanceOf[Sum]())
      {
        <synthetic> val temp13: Sum = temp10.$asInstanceOf[Sum]();
        Main.this.eval(temp13.e1()).+(Main.this.eval(temp13.e2()))
      }
    else
      throw new MatchError(temp10)
};

Since version 2.8, Scala has had the @switch annotation.从 2.8 版开始,Scala 有了@switch注释。 The goal is to ensure, that pattern matching will be compiled into tableswitch or lookupswitch instead of series of conditional if statements.目标是确保模式匹配将被编译到tableswitch 或 lookupswitch 中,而不是一系列条件if语句。

To expand on @Zifre's comment: if you are reading this in the future and the scala compiler has adopted new compilation strategies and you want to know what they are, here's how you find out what it does.扩展@Zifre 的评论:如果您将来正在阅读本文,并且 Scala 编译器采用了新的编译策略,并且您想知道它们是什么,那么您可以通过以下方式了解它的作用。

Copy-paste your match code into a self-contained example file.将您的match代码复制粘贴到一个独立的示例文件中。 Run scalac on that file.在该文件上运行scalac Then run javap -v -c theClassName$.class .然后运行javap -v -c theClassName$.class

For example, I put the following into /tmp/question.scala :例如,我将以下内容放入/tmp/question.scala

object question {
  abstract class Expr
  case class Number(n: Int) extends Expr
  case class Sum(e1: Expr, e2: Expr) extends Expr

  def eval(e: Expr): Int = e match {
    case Number(x) => x
    case Sum(l, r) => eval(l) + eval(r)
  }
}

Then I ran scalac question.scala , which produced a bunch of *.class files.然后我运行scalac question.scala ,它产生了一堆*.class文件。 Poking around a bit, I found the match statement inside question$.class .稍微翻了一下,我在question$.class找到了 match 语句。 The javap -c -v question$.class output is available below.下面提供了javap -c -v question$.class输出。

Since we're looking for a condition control flow construct, knowing about the java bytecode instruction set suggests that looking for "if" should be a good place to start.由于我们正在寻找条件控制流构造,因此了解 java 字节码指令集表明寻找“if”应该是一个很好的起点。

In two locations we find a pair of consecutive lines on the form isinstanceof <something>; ifeq <somewhere>在两个位置,我们在表单isinstanceof <something>; ifeq <somewhere>上找到了一对连续的行isinstanceof <something>; ifeq <somewhere> isinstanceof <something>; ifeq <somewhere> , which means: if the most recently computed value is not an instance of something then goto somewhere . isinstanceof <something>; ifeq <somewhere> ,这意味着:如果最近计算的值不是something的实例,则转到somewhere ( ifeq is jump if zero , and isinstanceof gives you a zero to represent false.) ifeqjump if zeroisinstanceof给你一个零来表示假。)

If you follow the control flow around, you'll see that it agrees with the answer given by @Jorge Ortiz: we do if (blah isinstanceof something) { ... } else if (blah isinstanceof somethingelse) { ... } .如果您遵循控制流程,您会发现它与@Jorge Ortiz 给出的答案一致:我们这样做if (blah isinstanceof something) { ... } else if (blah isinstanceof somethingelse) { ... }

Here is the javap -c -v question$.class output:这是javap -c -v question$.class输出:

Classfile /tmp/question$.class
  Last modified Nov 20, 2020; size 956 bytes
  MD5 checksum cfc788d4c847dad0863a797d980ad2f3
  Compiled from "question.scala"
public final class question$
  minor version: 0
  major version: 50
  flags: (0x0031) ACC_PUBLIC, ACC_FINAL, ACC_SUPER
  this_class: #2                          // question$
  super_class: #4                         // java/lang/Object
  interfaces: 0, fields: 1, methods: 3, attributes: 4
Constant pool:
   #1 = Utf8               question$
   #2 = Class              #1             // question$
   #3 = Utf8               java/lang/Object
   #4 = Class              #3             // java/lang/Object
   #5 = Utf8               question.scala
   #6 = Utf8               MODULE$
   #7 = Utf8               Lquestion$;
   #8 = Utf8               <clinit>
   #9 = Utf8               ()V
  #10 = Utf8               <init>
  #11 = NameAndType        #10:#9         // "<init>":()V
  #12 = Methodref          #2.#11         // question$."<init>":()V
  #13 = Utf8               eval
  #14 = Utf8               (Lquestion$Expr;)I
  #15 = Utf8               question$Number
  #16 = Class              #15            // question$Number
  #17 = Utf8               n
  #18 = Utf8               ()I
  #19 = NameAndType        #17:#18        // n:()I
  #20 = Methodref          #16.#19        // question$Number.n:()I
  #21 = Utf8               question$Sum
  #22 = Class              #21            // question$Sum
  #23 = Utf8               e1
  #24 = Utf8               ()Lquestion$Expr;
  #25 = NameAndType        #23:#24        // e1:()Lquestion$Expr;
  #26 = Methodref          #22.#25        // question$Sum.e1:()Lquestion$Expr;
  #27 = Utf8               e2
  #28 = NameAndType        #27:#24        // e2:()Lquestion$Expr;
  #29 = Methodref          #22.#28        // question$Sum.e2:()Lquestion$Expr;
  #30 = NameAndType        #13:#14        // eval:(Lquestion$Expr;)I
  #31 = Methodref          #2.#30         // question$.eval:(Lquestion$Expr;)I
  #32 = Utf8               scala/MatchError
  #33 = Class              #32            // scala/MatchError
  #34 = Utf8               (Ljava/lang/Object;)V
  #35 = NameAndType        #10:#34        // "<init>":(Ljava/lang/Object;)V
  #36 = Methodref          #33.#35        // scala/MatchError."<init>":(Ljava/lang/Object;)V
  #37 = Utf8               this
  #38 = Utf8               e
  #39 = Utf8               Lquestion$Expr;
  #40 = Utf8               x
  #41 = Utf8               I
  #42 = Utf8               l
  #43 = Utf8               r
  #44 = Utf8               question$Expr
  #45 = Class              #44            // question$Expr
  #46 = Methodref          #4.#11         // java/lang/Object."<init>":()V
  #47 = NameAndType        #6:#7          // MODULE$:Lquestion$;
  #48 = Fieldref           #2.#47         // question$.MODULE$:Lquestion$;
  #49 = Utf8               question
  #50 = Class              #49            // question
  #51 = Utf8               Sum
  #52 = Utf8               Expr
  #53 = Utf8               Number
  #54 = Utf8               Code
  #55 = Utf8               LocalVariableTable
  #56 = Utf8               LineNumberTable
  #57 = Utf8               StackMapTable
  #58 = Utf8               SourceFile
  #59 = Utf8               InnerClasses
  #60 = Utf8               ScalaInlineInfo
  #61 = Utf8               Scala
{
  public static final question$ MODULE$;
    descriptor: Lquestion$;
    flags: (0x0019) ACC_PUBLIC, ACC_STATIC, ACC_FINAL

  public static {};
    descriptor: ()V
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: new           #2                  // class question$
         3: invokespecial #12                 // Method "<init>":()V
         6: return

  public int eval(question$Expr);
    descriptor: (Lquestion$Expr;)I
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=3, locals=9, args_size=2
         0: aload_1
         1: astore_2
         2: aload_2
         3: instanceof    #16                 // class question$Number
         6: ifeq          27
         9: aload_2
        10: checkcast     #16                 // class question$Number
        13: astore_3
        14: aload_3
        15: invokevirtual #20                 // Method question$Number.n:()I
        18: istore        4
        20: iload         4
        22: istore        5
        24: goto          69
        27: aload_2
        28: instanceof    #22                 // class question$Sum
        31: ifeq          72
        34: aload_2
        35: checkcast     #22                 // class question$Sum
        38: astore        6
        40: aload         6
        42: invokevirtual #26                 // Method question$Sum.e1:()Lquestion$Expr;
        45: astore        7
        47: aload         6
        49: invokevirtual #29                 // Method question$Sum.e2:()Lquestion$Expr;
        52: astore        8
        54: aload_0
        55: aload         7
        57: invokevirtual #31                 // Method eval:(Lquestion$Expr;)I
        60: aload_0
        61: aload         8
        63: invokevirtual #31                 // Method eval:(Lquestion$Expr;)I
        66: iadd
        67: istore        5
        69: iload         5
        71: ireturn
        72: new           #33                 // class scala/MatchError
        75: dup
        76: aload_2
        77: invokespecial #36                 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
        80: athrow
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      81     0  this   Lquestion$;
            0      81     1     e   Lquestion$Expr;
           20      61     4     x   I
           47      34     7     l   Lquestion$Expr;
           54      27     8     r   Lquestion$Expr;
      LineNumberTable:
        line 6: 0
        line 7: 2
        line 8: 27
        line 6: 69
      StackMapTable: number_of_entries = 3
        frame_type = 252 /* append */
          offset_delta = 27
          locals = [ class question$Expr ]
        frame_type = 254 /* append */
          offset_delta = 41
          locals = [ top, top, int ]
        frame_type = 248 /* chop */
          offset_delta = 2
}
SourceFile: "question.scala"
InnerClasses:
  public static #51= #22 of #50;          // Sum=class question$Sum of class question
  public static abstract #52= #45 of #50; // Expr=class question$Expr of class question
  public static #53= #16 of #50;          // Number=class question$Number of class question
  ScalaInlineInfo: length = 0xE (unknown attribute)
   01 01 00 02 00 0A 00 09 01 00 0D 00 0E 01
  Scala: length = 0x0 (unknown attribute)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM