简体   繁体   中英

Scala - Efficient element wise sum of two arrays

I have two arrays which I would like to reduce to one array in which at each index you have the sum of the two elements in the original arrays. For example:

val arr1: Array[Int] = Array(1, 1, 3, 3, 5)
val arr1: Array[Int] = Array(2, 1, 2, 2, 1)
val arr3: Array[Int] = sum(arr1, arr2) 

// This should result in:
// arr3 = Array(3, 2, 5, 5, 6)

I've seen this post: Element-wise sum of arrays in Scala , and I currently use this approach (zip/map). However, using this for a big data application I am concerned about its performance. Using this approach one has to traverse the array(s) at least twice. Is there a better approach in terms of efficiency?

The most efficient way might well be to do it lazily.

As with anything collection-oriented, Scala 2.12 and 2.13 are going to be different (this code is Scala 2.13, but 2.12 will be similar... might extend IndexedSeqLike , but I don't know for sure)

import scala.collection.IndexedSeq
import scala.math.Numeric

case class SumIndexedSeq[+T: Numeric](seq1: IndexedSeq[T], seq2: IndexedSeq[T]) extends IndexedSeq[T] {
  override val length: Int = seq1.length.min(seq2.length)
  override def apply(i: Int) =
    if (i >= length) throw new IndexOutOfBoundsException
    else seq1(i) + seq2(i)
 }

Array s are implicitly convertible to a subtype of collection.IndexedSeq . This will compute the sum of the corresponding elements on every access (which may be generally desirable as it's possible to use a mutable IndexedSeq ).

If you need an Array , you can get one with only a single traversal via

val arr3: Array[Int] = SumIndexedSeq(arr1, arr2).toArray

but SumIndexedSeq can be used anywhere a Seq can be used without a traversal.

As a further optimization, especially if you're sure that the underlying collections/arrays won't mutate, you can add a cache so you don't add the same elements together twice. It can also be generalized, if you so care, to any binary operations on T (in which case the Numeric constraint can be removed).

As Luis noted, for a performance question: experiment and benchmark. It's worth keeping in mind that a cache implementation may well entail boxing every element to put in the cache, so you might need to be accessing the same elements many times in order for the cache to be a win (and a sufficiently large cache may have implications for the stability of a distributed system).

Well, first of all, as with all things related to performance the only answer is to benchmark.

Second, are you sure you need plain mutable, invariant, weird Arrays ? Can't you use something like Vector or ArraySeq ?

Third, you can just do something like this or using a while loop, which would be the same.

val result = ArraySeq.tabulate(math.min(arr1.length, arr2.length)) { i =>
  arr1(i) + arr2(i)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM