简体   繁体   English

如何编写一个接收映射函数到泛型类型的scala函数

[英]How to write a scala function which receives a map function to a generic type

Using Spark 1.3.0 with Scala, I have two functions which basically do the same on a given RDD[(Long, String, Boolean, String)] , up to a specifc map function from (Long, String, Boolean, String) to a tuple of 2 elements: 结合使用Scala 1.3.0和Spark,我有两个函数基本上在给定的RDD[(Long, String, Boolean, String)] ,直到从(Long, String, Boolean, String)到由2个元素组成的元组:

def rddToMap1(rdd: RDD[(Long, String, Boolean, String)]): Map[Long, Set[(String, Boolean)]] = {
rdd
  .map(t => (t._1, (t._2, t._3))) //mapping function 1
  .groupBy(_._1)
  .mapValues(_.toSet)
  .collect
  .toMap
  .mapValues(_.map(_._2))
  .map(identity)
}


def rddToMap2(rdd: RDD[(Long, String, Boolean, String)]): Map[(Long, String), Set[String]] = {
rdd
  .map(t => ((t._1, t._2), t._4)) //mapping function 2
  .groupBy(_._1)
  .mapValues(_.toSet)
  .collect
  .toMap
  .mapValues(_.map(_._2))
  .map(identity)
}

I want to write a generic function genericRDDToMap which I would later use to implement rddToMap1 and rddToMap2 . 我想编写一个通用函数genericRDDToMap ,稍后将使用它实现rddToMap1rddToMap2

This doesn't work: 这不起作用:

def genericRDDToMap[A](rdd: RDD[(Long, String, Boolean, String)], mapFn: (Long, String, Boolean, String) => A) = {      
rdd     
  .map(mapFn) //ERROR       
  .groupBy(_._1)        
  .mapValues(_.toSet)       
  .collect      
  .toMap        
  .mapValues(_.map(_._2))       
  .map(identity)        
}

The (Eclipse) interpreter doesn't take mapFn as a valid mapping function, it says: (Eclipse)解释器没有将mapFn作为有效的映射函数,它说:

type mismatch; found : (Long, String, Boolean, String) => A required: ((Long, String, Boolean, String)) => ?

And even if I got over this, how would it know that my generic type A has value _1 in the groupBy to follow? 即使我groupBy了这一点,又怎么知道我的通用类型A在要遵循的groupBy具有值_1

To summarize: how do I do it right? 总结一下:我该怎么做?

You missed parentheses around (Long, String, Boolean, String) . 您错过了(Long, String, Boolean, String)周围的括号。 And if A is of type TupleX , you can use upper bound to specify it (here I used Tuple2 ): 如果A的类型为TupleX ,则可以使用上限来指定它(在这里我使用Tuple2 ):

  def genericRDDToMap[X, Y, A <: Tuple2[X,Y]](rdd: RDD[(Long, String, Boolean, String)], 
                         mapFn: ((Long, String, Boolean, String)) => A) (implicit ev: ClassTag[A])= {     
      ... 
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM