简体   繁体   中英

value reduceByKey is not a member of org.apache.spark.rdd.RDD

It's very sad.My spark version is 2.1.1,Scala version is 2.11

import org.apache.spark.SparkContext._
import com.mufu.wcsa.component.dimension.{DimensionKey, KeyTrait}
import com.mufu.wcsa.log.LogRecord
import org.apache.spark.rdd.RDD

object PV {

//
  def stat[C <: LogRecord,K <:DimensionKey](statTrait: KeyTrait[C ,K],logRecords: RDD[C]): RDD[(K,Int)] = {
    val t = logRecords.map(record =>(statTrait.getKey(record),1)).reduceByKey((x,y) => x + y)

I got this error

at 1502387780429
[ERROR] /Users/lemanli/work/project/newcma/wcsa/wcsa_my/wcsavistor/src/main/scala/com/mufu/wcsa/component/stat/PV.scala:25: error: value reduceByKey is not a member of org.apache.spark.rdd.RDD[(K, Int)]
[ERROR]     val t = logRecords.map(record =>(statTrait.getKey(record),1)).reduceByKey((x,y) => x + y)

there is defined a trait

trait KeyTrait[C <: LogRecord,K <: DimensionKey]{
  def getKey(c:C):K
}

It is compiled,Thanks.

 def stat[C <: LogRecord,K <:DimensionKey : ClassTag : Ordering](statTrait: KeyTrait[C ,K],logRecords: RDD[C]): RDD[(K,Int)] = {
    val t = logRecords.map(record =>(statTrait.getKey(record),1)).reduceByKey((x,y) => x + y)

Key need to override Ordering[T].

  object ClientStat extends KeyTrait[DetailLogRecord, ClientStat] {
      implicit val c

lientStatSorting = new Ordering[ClientStat] {
    override def compare(x: ClientStat, y: ClientStat): Int = x.key.compare(y.key)
  }

      def getKey(detailLogRecord: DetailLogRecord): ClientStat = new ClientStat(detailLogRecord)
    }

This comes from using a pair rdd function generically. The reduceByKey method is actually a method of the PairRDDFunctions class, which has an implicit conversion from RDD :

implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
    (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V]

So it requires several implicit typeclasses. Normally when working with simple concrete types, those are already in scope. But you should be able to amend your method to also require those same implicits:

def stat[C <: LogRecord,K <:DimensionKey](statTrait: KeyTrait[C ,K],logRecords: RDD[C])(implicit kt: ClassTag[K], ord: Ordering[K])

Or using the newer syntax:

def stat[C <: LogRecord,K <:DimensionKey : ClassTag : Ordering](statTrait: KeyTrait[C ,K],logRecords: RDD[C])

reduceByKey is a method that is only defined on RDDs of tuples, ie RDD[(K, V)] (K, V is just a convention to say that first is key second is value).

Not sure from the example about what you are trying to achieve, but for sure you need to convert the values inside the RDD to tuples of two values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM