简体   繁体   English

scala隐含性能

[英]scala implicit performance

This comes up regularly. 这经常出现。 Functions coded up using generics are signifficnatly slower in scala. 使用泛型编码的函数在scala中显然更慢。 See example below. 见下面的例子。 Type specific version performs about a 1/3 faster than the generic version. 特定类型的版本比通用版本快约1/3。 This is doubly surprising given that the generic component is outside of the expensive loop. 鉴于通用组件在昂贵的循环之外,这是双倍意外的。 Is there a known explanation for this? 对此有一个已知的解释吗?

  def xxxx_flttn[T](v: Array[Array[T]])(implicit m: Manifest[T]): Array[T] = {
    val I = v.length
    if (I <= 0) Array.ofDim[T](0)
    else {
      val J = v(0).length
      for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
      val flt = Array.ofDim[T](I * J)
      for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
      flt
    }
  }
  def flttn(v: Array[Array[Double]]): Array[Double] = {
    val I = v.length
    if (I <= 0) Array.ofDim[Double](0)
    else {
      val J = v(0).length
      for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
      val flt = Array.ofDim[Double](I * J)
      for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
      flt
    }
  }

This is due to boxing, when you apply the generic to a primitive type and use containing arrays (or the type appearing plain in method signatures or as member). 这是由于装箱,当您将通用应用于基本类型并使用包含数组(或在方法签名中显示为普通的类型或作为成员)时。

Example

In the following trait, after compilation, the process method will take an erased Array[Any] . 在以下特征中,编译后, process方法将采用已擦除的Array[Any]

trait Foo[A]{
  def process(as: Array[A]): Int
}

If you choose A to be a value/primitive type, like Double it has to be boxed. 如果选择A作为值/基本类型,如Double ,则必须装箱。 When writing the trait in a non-generic way (eg with A=Double ), process is compiled to take an Array[Double] , which is a distinct array type on the JVM. 当以非泛型方式编写特征时(例如,使用A=Double ),编译process以获取Array[Double] ,这是JVM上的一种不同的数组类型。 This is more efficient, since in order to store a Double inside the Array[Any] , the Double has to be wrapped (boxed) into an object, a reference to which gets stored inside the array. 这样更有效,因为为了在Array[Any]存储DoubleDouble必须被包装(装箱)到一个对象中,该对象被存储在数组中。 The special Array[Double] can store the Double directly in memory as a 64-Bit value. 特殊的Array[Double]可以将Double直接存储在内存中作为64位值。

The @specialized -Annotation @specialized -Annotation

If you feel adventerous, you can try the @specialized keyword (it's pretty buggy and crashes the compiler often). 如果你感觉很冒昧,你可以试试@specialized关键字(它很漂亮,经常会崩溃编译器)。 This makes scalac compile special versions of a class for all or selected primitive types. 这使得scalac为所有或选定的基元类型编译类的特殊版本。 This only makes sense, if the type parameter appears plain in type signatures ( get(a: A) , but not get(as: Seq[A]) ) or as a type paramter to Array . 这只是有意义的,如果类型参数在类型签名( get(a: A) ,但不是get(as: Seq[A]) )中显示为plain,或者作为Array的类型参数。 I think you'll receive a warning if speicialization is pointless. 如果专业化没有意义,我想你会收到警告。

You can't really tell what you're measuring here--not very well, anyway--because the for loop isn't as fast as a pure while loop, and the inner operation is quite inexpensive. 你无法真正告诉你在这里测量什么 - 不管怎么说 - 因为for循环不像纯while循环那么快,而且内部操作相当便宜。 If we rewrite the code with while loops--the key double-iteration being 如果我们用while循环重写代码 - 关键的双重迭代是

 var i = 0
  while (i<I) {
    var j = 0
    while (j<J) {
      flt(i * J + j) = v(i)(j)
      j += 1
    }
    i += 1
  }
  flt

then we see that the bytecode for the generic case is actually dramatically different. 然后我们看到通用案例的字节码实际上是截然不同的。 Non-generic: 非通用:

133:    checkcast   #174; //class "[D"
136:    astore  6
138:    iconst_0
139:    istore  5
141:    iload   5
143:    iload_2
144:    if_icmpge   191
147:    iconst_0
148:    istore  4
150:    iload   4
152:    iload_3
153:    if_icmpge   182
// The stuff above implements the loop; now we do the real work
156:    aload   6
158:    iload   5
160:    iload_3
161:    imul
162:    iload   4
164:    iadd
165:    aload_1
166:    iload   5
168:    aaload             // v(i)
169:    iload   4
171:    daload             // v(i)(j)
172:    dastore            // flt(.) = _
173:    iload   4
175:    iconst_1
176:    iadd
177:    istore  4
// Okay, done with the inner work, time to jump around
179:    goto    150
182:    iload   5
184:    iconst_1
185:    iadd
186:    istore  5
188:    goto    141

It's just a bunch of jumps and low-level operations (daload and dastore being the key ones that load and store a double from an array). 它只是一堆跳转和低级操作(daload和dastore是从数组中加载和存储double的关键)。 If we look at the key inner part of the generic bytecode, it instead looks like 如果我们看一下通用字节码的关键内部部分,它看起来就像

160:    getstatic   #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
163:    aload   7
165:    iload   6
167:    iload   4
169:    imul
170:    iload   5
172:    iadd
173:    getstatic   #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
176:    aload_1
177:    iload   6
179:    aaload
180:    iload   5
182:    invokevirtual   #107; //Method scala/runtime/ScalaRunTime$.array_apply:(Ljava/lang/Object;I)Ljava/lang/Object;
185:    invokevirtual   #111; //Method scala/runtime/ScalaRunTime$.array_update:(Ljava/lang/Object;ILjava/lang/Object;)V
188:    iload   5
190:    iconst_1
191:    iadd
192:    istore  5

which, as you can see, has to call methods to do the array apply and update. 正如您所看到的,必须调用方法来执行数组应用和更新。 The bytecode for that is a huge mess of stuff like 这个字节码是一堆乱七八糟的东西

2:   aload_3 
3:   instanceof      #98; //class "[Ljava/lang/Object;"
6:   ifeq    18
9:   aload_3   
10:  checkcast       #98; //class "[Ljava/lang/Object;"
13:  iload_2
14:  aaload 
15:  goto    183
18:  aload_3
19:  instanceof      #100; //class "[I"
22:  ifeq    37
25:  aload_3   
26:  checkcast       #100; //class "[I"
29:  iload_2
30:  iaload 
31:  invokestatic    #106; //Method scala/runtime/BoxesRunTime.boxToInteger:
34:  goto    183
37:  aload_3
38:  instanceof      #108; //class "[D"
41:  ifeq    56
44:  aload_3   
45:  checkcast       #108; //class "[D"
48:  iload_2
49:  daload 
50:  invokestatic    #112; //Method scala/runtime/BoxesRunTime.boxToDouble:(
53:  goto    183

which basically has to test each type of array and box it if it's the type you're looking for. 它基本上必须测试每种类型的数组,如果它是你正在寻找的类型,请将其装箱。 Double is pretty near the front (3rd of 10), but it's still a pretty major overhead, even if the JVM can recognize that the code ends up being box/unbox and therefore doesn't actually need to allocate memory. Double非常靠近前面(10的第3位),但它仍然是一个相当大的开销,即使JVM可以识别代码最终是box / unbox,因此实际上并不需要分配内存。 (I'm not sure it can do that, but even if it could it wouldn't solve the problem.) (我不确定它能做到这一点,但即使它可能无法解决问题。)

So, what to do? 那么该怎么办? You can try [@specialized T], which will expand your code tenfold for you, as if you wrote each primitive array operation by yourself. 您可以尝试[@specialized T],这将为您扩展您的代码十倍,就像您自己编写每个原始数组操作一样。 Specialization is buggy in 2.9 (should be less so in 2.10), though, so it may not work the way you hope. 专业化是2.9中的错误(在2.10中应该不那么),但是它可能不会像你希望的那样工作。 If speed is of the essence--well, first, write while loops instead of for loops (or at least compile with -optimise which helps for loops out by a factor of two or so!), and then consider either specialization or writing the code by hand for the types you require. 如果速度至关重要 - 好吧,首先,写while循环而不是for循环(或至少使用-optimise编译,这有助于循环输出大约两倍!),然后考虑专业化或编写手动编码您需要的类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM