简体   繁体   中英

Fast packed arrays of structs in Scala

I'm investigating what it would take to turn an existing mixed Python/C++ numerical codebase into mixed Scala/C++ (ideally mostly Scala in the long run). I expect the biggest issue to be packed arrays of structs. For example, in C++ we have types like

Array<Vector<double,3>> # analogous to double [][3]
Array<Frame<Vector<double,3>>> # a bunch of translation,quaternion pairs

These can be converted back and forth between Python and C++ without copying thanks to Numpy.

On the JVM, since unboxed arrays can only have a handful of types, the only way I can imagine proceeding is to create (1) a boxed Scala type for each struct, such as Vector<double,3> and (2) a typed thin wrapper around Array[Double] that knows what struct it's supposed to be and creates/consumes boxed singletons as necessary.

Are there any existing libraries that do such a thing, or that implement any alternatives for packed arrays of structs? Does anyone have experience regarding what the performance characters would be likely be, and whether existing compilers and JVM's would be able to optimize the boxes away in at least the nonpolymorphic, sealed case?

Note that packing and nice typing are not optional: Without packing I'll blow memory very quickly, and if all I have is Array[Double] C++'s type system (unfortunately) wins.

The question is really whether there is anything but numbers in there. If it's just a pile of doubles, you can write a wrapper in Scala, but you ought not count on avoiding boxing. Instead, consider writing mutable wrappers:

trait Vec3 {
  def x: Double
  def y: Double
  def z: Double
}
class ArrayedVec3(array: Array[Double]) extends Vec3 {
  private[this] var index = 0
  def goto(i: Int) = { index = i*3; this }
  def x = array(index)
  def y = array(index+1)
  def z = array(index+2)
}

You could make ArrayedVec3 implement Iterator , returning itself as next , or various other things for cases where you want ease of use not efficiency.

But the point is that if you're willing to manage the creation and movement of these adapters yourself, you don't need to worry about boxing. You only create the "box" once, and then it jumps around to wherever you need it.

If you're content with performance within ~2x of C++, and are aiming for single-threaded use, this ought to do the trick. (It's worked for me in the past.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM