简体   繁体   English

使用标准库在 Scala 中计算两个稀疏向量的点积(并生成它们)

[英]Calculating the Dot Product of two Sparse Vectors (and generating them) in Scala using the standard library

I am trying to calculate the dot product (scalar product) of two sparse vectors in Scala.我正在尝试计算 Scala 中两个稀疏向量的点积(标量积)。 The code I have written is doing everything that I want it to, except when multiplying the similar elements of the two vectors, it is not accounting for the 0 values.我编写的代码正在做我想做的一切,除了将两个向量的相似元素相乘时,它没有考虑 0 值。

I expect to get 72 as my answer as 3 and 18 are the only keys that are both non-zero and they evaluate to: (3 -> 21) + (18 -> 51) = 72我希望得到 72 作为我的答案,因为 3 和 18 是唯一非零的键,它们的计算结果为:(3 -> 21) + (18 -> 51) = 72

I used withDefaultValue(0) hoping it would "fill in" the unmentioned key/value pairs but I do not think this is the case, and I believe this is where my problem is coming from, in the very beginning.我使用 withDefaultValue(0) 希望它能“填充”未提及的键/值对,但我认为情况并非如此,而且我相信这就是我的问题的来源,一开始。 I think my question could also be "How to generate a Sparse Vector in Scala using the Standard Library".我想我的问题也可能是“如何使用标准库在 Scala 中生成稀疏向量”。

If I enter the corresponding 0's and the two Maps (vectors) have the same number of key/value pairs, my code works properly.如果我输入相应的 0 并且两个映射(向量)具有相同数量的键/值对,则我的代码可以正常工作。

```
  val Sparse1 = Map(0 -> 4, 3 -> 7, 6 -> 11, 18 -> 17).withDefaultValue(0)
  val Sparse2 = Map(1 -> 3, 3 -> 3, 11 -> 2,18 -> 3, 20 -> 6).withDefaultValue(0)
  //println(Sparse2.toSeq)//to see what it is....0's missing
  val SparseSum = (Sparse1.toSeq ++ Sparse2.toSeq).groupBy(_._1).mapValues(_.map(_._2).sum)
  //println(SparseSum)
  val productOfValues = ((Sparse1.toSeq ++ Sparse2.toSeq).groupBy(_._1).mapValues(_.map(_._2).reduce(_*_)))
  //println(productOfValues)
  var dotProduct = 0
  for ((h,i) <- productOfValues) {
    dotProduct += i
  }
  //println(dotProduct)
  //If I specify some zero values, lets see what happens:
  val Sparse3 = Map(0 -> 4, 1 -> 0, 3 -> 7, 6 -> 11, 11 -> 0, 18 -> 17, 20 -> 0).withDefaultValue(0)
  val Sparse4 = Map(0 -> 0, 1 -> 3, 3 -> 3, 6 -> 0, 11 -> 2,18 -> 3, 20 -> 6).withDefaultValue(0)
  val productOfValues2 = ((Sparse3.toSeq ++ Sparse4.toSeq).groupBy(_._1).mapValues(_.map(_._2).reduce(_*_)))
  var dotProduct2 = 0
  for ((l, m) <- productOfValues2) {
    dotProduct2 += m
  }
  println(productOfValues2)
  println(dotProduct2)//I get 72

```

I can create a Sparse Vector this way, and then update the values我可以通过这种方式创建一个稀疏向量,然后更新值

  import scala.collection.mutable.Map
  val Sparse1 = Map[Int, Int]()
  for (k <- 0 to 20) {
    Sparse1 getOrElseUpdate (k, 0)
  }
  val Sparse2 = Map[Int, Int]()
  for (k <- 0 to 20) {
    Sparse2 getOrElseUpdate (k, 0)
  }

But I'm wondering if there is a "better" way.但我想知道是否有“更好”的方法。 More along the lines of what I tried and failed to do using "withDefaultValue(0)"更多关于我尝试使用“withDefaultValue(0)”但未能做到的事情

Since you are using sparse vectors, you can ignore all keys that are not on both vectors.由于您使用的是稀疏向量,因此您可以忽略不在两个向量上的所有键。
Thus, I would compute the intersection between both keys sets and then perform a simple map-reduce to compute the dot product.因此,我将计算两个键集之间的intersection ,然后执行一个简单的map-reduce来计算点积。

type SparseVector[T] = Map[Int, T]

/** Generic function for any type T that can be multiplied & summed. */
def sparseDotProduct[T: Numeric](v1: SparseVector[T], v2: SparseVector[T]): T = {
  import Numeric.Implicits._

  val commonIndexes = v1.keySet & v2.keySet

  commonIndexes
    .map(i => v1(i) * v2(i))
    .foldLeft(implicitly[Numeric[T]].zero)(_ + _)
}

Then, you can use it like this:然后,您可以像这样使用它:

// The withDefault(0) is optional now.
val sparse1 = Map(0 -> 4, 3 -> 7, 6 -> 11, 18 -> 17).withDefaultValue(0)
val sparse2 = Map(1 -> 3, 3 -> 3, 11 -> 2, 18 -> 3, 20 -> 6).withDefaultValue(0)

sparseDotProduct(sparse1, sparse2)
// res: Int = 72

Edit - the same method, but without context bounds & implicit syntax.编辑 - 相同的方法,但没有上下文边界和隐式语法。

type SparseVector[T] = Map[Int, T]

/** Generic function for any type T that can be multiplied & summed. */
def sparseDotProduct[T](v1: SparseVector[T], v2: SparseVector[T])(implicit N: Numeric[T]): T = {      
  val commonIndexes = v1.keySet & v2.keySet

  commonIndexes
    .map(i => N.times(v1(i), v2(i)))
    .foldLeft(N.zero)((acc, element) => N.plus(acc, element))
}

Bonus - General approach for non-spare vectors.奖励 - 非备用向量的通用方法。

One can modify the above method to work for any kind of vector, not just spare.可以修改上述方法以适用于任何类型的向量,而不仅仅是备用。 In this case, we would need the union of the keys, and take into account cases where one key does not exist on the other.在这种情况下,我们需要键的并union ,并考虑一个键在另一个上不存在的情况。

type MyVector[T] = Map[Int, T]

/** Generic function for any type T that can be multiplied & summed. */
def dotProduct[T: Numeric](v1: MyVector[T], v2: MyVector[T]): T = {
  import Numeric.Implicits._
  val zero = implicitly[Numeric[T]].zero

  val allIndexes = v1.keySet | v2.keySet

  allIndexes.map { i =>
     v1.getOrElse(
       key = i,
       default = zero
     ) * v2.getOrElse(
       key = i,
       default = zero
     )
   }.foldLeft(zero)(_ + _)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM