Spark-Prediction.io-scala.MatchError：空

Question

我正在为prediction.io设计模板，并且遇到了Spark的麻烦。

我不断收到scala.MatchError错误：这里的要点是

scala.MatchError: null
at org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:831)
at org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict(MatrixFactorizationModel.scala:66)
at org.template.prediction.ALSAlgorithm$$anonfun$predict$1$$anonfun$apply$1.apply(ALSAlgorithm.scala:86)
at org.template.prediction.ALSAlgorithm$$anonfun$predict$1$$anonfun$apply$1.apply(ALSAlgorithm.scala:79)
at scala.Option.map(Option.scala:145)
at org.template.prediction.ALSAlgorithm$$anonfun$predict$1.apply(ALSAlgorithm.scala:79)
at org.template.prediction.ALSAlgorithm$$anonfun$predict$1.apply(ALSAlgorithm.scala:78)

代码github源代码在这里

val usersWithCounts =
  ratingsRDD
    .map(r => (r.user, (1, Seq[Rating](Rating(r.user, r.item, r.rating)))))
    .reduceByKey((v1, v2) => (v1._1 + v2._1, v1._2.union(v2._2)))
    .filter(_._2._1 >= evalK)

// create evalK folds of ratings
(0 until evalK).map { idx =>
  // start by getting this fold's ratings for each user
  val fold = usersWithCounts
    .map { userKV =>
      val userRatings = userKV._2._2.zipWithIndex
      val trainingRatings = userRatings.filter(_._2 % evalK != idx).map(_._1)
      val testingRatings = userRatings.filter(_._2 % evalK == idx).map(_._1)
      (trainingRatings, testingRatings) // split the user's ratings into a training set and a testing set
    }
    .reduce((l, r) => (l._1.union(r._1), l._2.union(r._2))) // merge all the testing and training sets into a single testing and training set

  val testingSet = fold._2.map {
    r => (new Query(r.user, r.item), new ActualResult(r.rating))
  }

  (
    new TrainingData(sc.parallelize(fold._1)),
    new EmptyEvaluationInfo(),
    sc.parallelize(testingSet)
  )

}

为了进行评估，我需要将等级分为培训和测试小组。 为了确保将每个用户都纳入培训范围，我将所有用户的评分分组在一起，然后对每个用户进行拆分，然后将拆分加入在一起。

也许有更好的方法可以做到这一点？

Answer 1

该错误意味着MLlib MatrixFactorizationModel的userFeatures不包含用户ID（例如，如果用户不在训练数据中）。 MLlib在查找（使用.head）之后不检查此内容： https : //github.com/apache/spark/blob/v1.2.0/mllib/src/main/scala/org/apache/spark/mllib /recommendation/MatrixFactorizationModel.scala#L66

要进行调试，可以实现对model.predict（）的修改版本，以检查模型中是否存在userId / itemId而不是调用默认值：

val itemScore = model.predict(userInt, itemInt)

（ https://github.com/nickpoorman/template-scala-parallel-prediction/blob/master/src/main/scala/ALSAlgorithm.scala#L80 ）：

更改为使用.headOption：

val itemScore = model.userFeatures.lookup(userInt).headOption.map { userFeature =>
  model.productFeatures.lookup(itemInt).headOption.map { productFeature =>
    val userVector = new DoubleMatrix(userFeature)
    val productVector = new DoubleMatrix(productFeature)
    userVector.dot(productVector)
  }.getOrElse{
     logger.info(s"No itemFeature for item ${query.item}.")
     0.0 // return default score
  }
}.getOrElse{
   logger.info(s"No userFeature for user ${query.user}.")
   0.0 // return default score
}

Spark-Prediction.io-scala.MatchError：空

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-31 19:41:34

Spark-Prediction.io-scala.MatchError：空

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-31 19:41:34

解决方案1
1 已采纳 2015-03-31 19:41:34