简体   繁体   English

Scala - 两个 arrays 的有效元素之和

[英]Scala - Efficient element wise sum of two arrays

I have two arrays which I would like to reduce to one array in which at each index you have the sum of the two elements in the original arrays.我有两个 arrays,我想将其减少为一个数组,其中每个索引处都有原始 arrays 中两个元素的总和。 For example:例如:

val arr1: Array[Int] = Array(1, 1, 3, 3, 5)
val arr1: Array[Int] = Array(2, 1, 2, 2, 1)
val arr3: Array[Int] = sum(arr1, arr2) 

// This should result in:
// arr3 = Array(3, 2, 5, 5, 6)

I've seen this post: Element-wise sum of arrays in Scala , and I currently use this approach (zip/map).我看过这篇文章: Scala 中 arrays 的元素总和,我目前使用这种方法(zip/map)。 However, using this for a big data application I am concerned about its performance.但是,将其用于大数据应用程序时,我担心它的性能。 Using this approach one has to traverse the array(s) at least twice.使用这种方法,必须至少遍历数组两次。 Is there a better approach in terms of efficiency?在效率方面有更好的方法吗?

The most efficient way might well be to do it lazily.最有效的方法可能是懒惰地做。

As with anything collection-oriented, Scala 2.12 and 2.13 are going to be different (this code is Scala 2.13, but 2.12 will be similar... might extend IndexedSeqLike , but I don't know for sure)与任何面向集合的东西一样,Scala 2.12 和 2.13 将有所不同(此代码为 Scala 2.13,但 2.12 将相似......可能会扩展IndexedSeqLike ,但我不确定)

import scala.collection.IndexedSeq
import scala.math.Numeric

case class SumIndexedSeq[+T: Numeric](seq1: IndexedSeq[T], seq2: IndexedSeq[T]) extends IndexedSeq[T] {
  override val length: Int = seq1.length.min(seq2.length)
  override def apply(i: Int) =
    if (i >= length) throw new IndexOutOfBoundsException
    else seq1(i) + seq2(i)
 }

Array s are implicitly convertible to a subtype of collection.IndexedSeq . Array可以隐式转换为collection.IndexedSeq的子类型。 This will compute the sum of the corresponding elements on every access (which may be generally desirable as it's possible to use a mutable IndexedSeq ).这将计算每次访问时对应元素的总和(这通常是可取的,因为可以使用可变的IndexedSeq )。

If you need an Array , you can get one with only a single traversal via如果您需要一个Array ,您只需通过一次遍历即可获得一个

val arr3: Array[Int] = SumIndexedSeq(arr1, arr2).toArray

but SumIndexedSeq can be used anywhere a Seq can be used without a traversal.但是SumIndexedSeq可以在任何可以使用Seq而无需遍历的地方使用。

As a further optimization, especially if you're sure that the underlying collections/arrays won't mutate, you can add a cache so you don't add the same elements together twice.作为进一步的优化,特别是如果您确定底层集合/数组不会发生变异,您可以添加一个缓存,这样您就不会将相同的元素添加到一起两次。 It can also be generalized, if you so care, to any binary operations on T (in which case the Numeric constraint can be removed).如果您愿意,它也可以推广到T上的任何二元运算(在这种情况下,可以删除Numeric约束)。

As Luis noted, for a performance question: experiment and benchmark.正如 Luis 所说,对于性能问题:实验和基准测试。 It's worth keeping in mind that a cache implementation may well entail boxing every element to put in the cache, so you might need to be accessing the same elements many times in order for the cache to be a win (and a sufficiently large cache may have implications for the stability of a distributed system).值得记住的是,缓存实现很可能需要将每个元素装箱以放入缓存中,因此您可能需要多次访问相同的元素才能使缓存获胜(并且足够大的缓存可能具有对分布式系统稳定性的影响)。

Well, first of all, as with all things related to performance the only answer is to benchmark.好吧,首先,与所有与性能相关的事情一样,唯一的答案就是进行基准测试。

Second, are you sure you need plain mutable, invariant, weird Arrays ?其次,你确定你需要简单的可变、不变、奇怪的 Arrays吗? Can't you use something like Vector or ArraySeq ?你不能使用VectorArraySeq 之类的东西吗?

Third, you can just do something like this or using a while loop, which would be the same.第三,你可以做这样的事情或使用一个while循环,这也是一样的。

val result = ArraySeq.tabulate(math.min(arr1.length, arr2.length)) { i =>
  arr1(i) + arr2(i)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM