简体   繁体   中英

Compilation issue in spark scala script containing join on RDDs with 2 columns

I am trying to compile the following script using sbt package command.

import org.apache.spark.SparkContext, org.apache.spark.SparkConf, org.apache.spark.rdd.PairRDDFunctions
object CusMaxRevenue {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("CusMaxRevenue")
    val sc = new SparkContext(conf)
    val ordersRDD = sc.textFile("/user/sk/sqoop_import/orders")
    val orderItemsRDD = sc.textFile("/user/sk/sqoop_import/order_items")

    // val ordersParsedRDD = ordersRDD.map( rec => ((rec.split(",")(0).toInt), (rec.split(",")(1),rec.split(",")(2)) ))
    val ordersParsedRDD = ordersRDD.map( rec => ((rec.split(",")(0).toInt), rec.split(",")(1) ))

    val orderItemsParsedRDD = orderItemsRDD.map(rec => ((rec.split(",")(1)).toInt, rec.split(",")(4).toFloat))

    val ordersJoinOrderItems = orderItemsParsedRDD.join(ordersParsedRDD)
}
}

I get the following error:

[info] Set current project to Customer with Max revenue (in build file:/home/sk/scala/app3/)
[info] Compiling 1 Scala source to /home/sk/scala/app3/target/scala-2.10/classes...
[error] /home/sk/scala/app3/src/main/scala/CusMaxRevenue.scala:14: value join is not a member of org.apache.spark.rdd.RDD[(Int, Float)]
[error]     val ordersJoinOrderItems = orderItemsParsedRDD.join(ordersParsedRDD)
[error]                                                    ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed

Sample Data:

--ordersParsedRDD
(1,2013-07-25 00:00:00.0)
(2,2013-07-25 00:00:00.0)
(3,2013-07-25 00:00:00.0)
(4,2013-07-25 00:00:00.0)
(5,2013-07-25 00:00:00.0)

--orderItemsParsedRDD
(9.98)
(2,199.99)
(2,250.0)
(2,129.99)
(4,49.98)

When i execute the statements individually on spark scala prompt the join seems to work. PS: I had a few columns in the RDDs but inoder to investigate further i kept just the 2 but i still get the compilation issue !

Additional Info: Content of my CusMaxRevenue.sbt file

name := "Customer with Max revenue"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1"

You need to add the import:

import org.apache.spark.SparkContext._

which bring all the implicit conversions.

尝试这个:

val orderItemsParsedRDD = orderItemsRDD.map(rec => ( ((rec.split(",")(1).toInt), rec.split(",")(4).toFloat)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM