This comes up regularly. Functions coded up using generics are signifficnatly slower in scala. See example below. Type specific version performs about a 1/3 faster than the generic version. This is doubly surprising given that the generic component is outside of the expensive loop. Is there a known explanation for this?
def xxxx_flttn[T](v: Array[Array[T]])(implicit m: Manifest[T]): Array[T] = {
val I = v.length
if (I <= 0) Array.ofDim[T](0)
else {
val J = v(0).length
for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
val flt = Array.ofDim[T](I * J)
for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
flt
}
}
def flttn(v: Array[Array[Double]]): Array[Double] = {
val I = v.length
if (I <= 0) Array.ofDim[Double](0)
else {
val J = v(0).length
for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
val flt = Array.ofDim[Double](I * J)
for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
flt
}
}
This is due to boxing, when you apply the generic to a primitive type and use containing arrays (or the type appearing plain in method signatures or as member).
In the following trait, after compilation, the process
method will take an erased Array[Any]
.
trait Foo[A]{
def process(as: Array[A]): Int
}
If you choose A
to be a value/primitive type, like Double
it has to be boxed. When writing the trait in a non-generic way (eg with A=Double
), process
is compiled to take an Array[Double]
, which is a distinct array type on the JVM. This is more efficient, since in order to store a Double
inside the Array[Any]
, the Double
has to be wrapped (boxed) into an object, a reference to which gets stored inside the array. The special Array[Double]
can store the Double
directly in memory as a 64-Bit value.
@specialized
-Annotation If you feel adventerous, you can try the @specialized
keyword (it's pretty buggy and crashes the compiler often). This makes scalac
compile special versions of a class for all or selected primitive types. This only makes sense, if the type parameter appears plain in type signatures ( get(a: A)
, but not get(as: Seq[A])
) or as a type paramter to Array
. I think you'll receive a warning if speicialization is pointless.
You can't really tell what you're measuring here--not very well, anyway--because the for
loop isn't as fast as a pure while
loop, and the inner operation is quite inexpensive. If we rewrite the code with while loops--the key double-iteration being
var i = 0
while (i<I) {
var j = 0
while (j<J) {
flt(i * J + j) = v(i)(j)
j += 1
}
i += 1
}
flt
then we see that the bytecode for the generic case is actually dramatically different. Non-generic:
133: checkcast #174; //class "[D"
136: astore 6
138: iconst_0
139: istore 5
141: iload 5
143: iload_2
144: if_icmpge 191
147: iconst_0
148: istore 4
150: iload 4
152: iload_3
153: if_icmpge 182
// The stuff above implements the loop; now we do the real work
156: aload 6
158: iload 5
160: iload_3
161: imul
162: iload 4
164: iadd
165: aload_1
166: iload 5
168: aaload // v(i)
169: iload 4
171: daload // v(i)(j)
172: dastore // flt(.) = _
173: iload 4
175: iconst_1
176: iadd
177: istore 4
// Okay, done with the inner work, time to jump around
179: goto 150
182: iload 5
184: iconst_1
185: iadd
186: istore 5
188: goto 141
It's just a bunch of jumps and low-level operations (daload and dastore being the key ones that load and store a double from an array). If we look at the key inner part of the generic bytecode, it instead looks like
160: getstatic #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
163: aload 7
165: iload 6
167: iload 4
169: imul
170: iload 5
172: iadd
173: getstatic #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
176: aload_1
177: iload 6
179: aaload
180: iload 5
182: invokevirtual #107; //Method scala/runtime/ScalaRunTime$.array_apply:(Ljava/lang/Object;I)Ljava/lang/Object;
185: invokevirtual #111; //Method scala/runtime/ScalaRunTime$.array_update:(Ljava/lang/Object;ILjava/lang/Object;)V
188: iload 5
190: iconst_1
191: iadd
192: istore 5
which, as you can see, has to call methods to do the array apply and update. The bytecode for that is a huge mess of stuff like
2: aload_3
3: instanceof #98; //class "[Ljava/lang/Object;"
6: ifeq 18
9: aload_3
10: checkcast #98; //class "[Ljava/lang/Object;"
13: iload_2
14: aaload
15: goto 183
18: aload_3
19: instanceof #100; //class "[I"
22: ifeq 37
25: aload_3
26: checkcast #100; //class "[I"
29: iload_2
30: iaload
31: invokestatic #106; //Method scala/runtime/BoxesRunTime.boxToInteger:
34: goto 183
37: aload_3
38: instanceof #108; //class "[D"
41: ifeq 56
44: aload_3
45: checkcast #108; //class "[D"
48: iload_2
49: daload
50: invokestatic #112; //Method scala/runtime/BoxesRunTime.boxToDouble:(
53: goto 183
which basically has to test each type of array and box it if it's the type you're looking for. Double is pretty near the front (3rd of 10), but it's still a pretty major overhead, even if the JVM can recognize that the code ends up being box/unbox and therefore doesn't actually need to allocate memory. (I'm not sure it can do that, but even if it could it wouldn't solve the problem.)
So, what to do? You can try [@specialized T], which will expand your code tenfold for you, as if you wrote each primitive array operation by yourself. Specialization is buggy in 2.9 (should be less so in 2.10), though, so it may not work the way you hope. If speed is of the essence--well, first, write while loops instead of for loops (or at least compile with -optimise which helps for loops out by a factor of two or so!), and then consider either specialization or writing the code by hand for the types you require.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.