简体   繁体   中英

Running Spark Application from Eclipse

I am trying to develop a spark application on Eclipse, and then debug it by stepping through it.

I downloaded the Spark source code and I have added some of the spark sub projects(such as spark-core) to Eclipse. Now, I am trying to develop a spark application using Eclipse. I have already installed the ScalaIDE on Eclipse. I created a simple application based on the example given in the Spark website.

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

To my project, I added the spark-core project as a dependent project(right click -> build path -> add project). Now, I am trying to build my application and run it. However, my project shows that it has errors, but I don't see any errors listed in the problems view within Eclipse, nor do I see any lines highlighted in red. So, I am not sure what the problem is. My assumption is that I need to add external jars to my project, but I am not sure what these jars would be. The error is caused by val conf = new SparkConf().setAppName("Simple Application") and the subsequent lines. I tried removing those lines, and the error went away. I would appreciate any help and guidance, thanks!

It seems you are not using any package/library manager (eg sbt, maven) which should eliminate any versioning issues. It might be challenging to set correct version of java, scala, spark and all its subsequent dependencies on your own. I strongly recommend to change your your project into Maven : Convert Existing Eclipse Project to Maven Project

Personally, I have very good experiences with sbt on IntelliJ IDEA ( https://confluence.jetbrains.com/display/IntelliJIDEA/Getting+Started+with+SBT ) which is easy to set up and maintain.

I've just created a Maven archetype for Spark the other day.
It sets up a new Spark 1.3.0 project in Eclipse/Idea with Scala 2.10.4 .

Just follow the instructions here .

You'll just have to change the Scala version after the project is generated:
Right click on the generated project and select:
Scala > Set the Scala Installation > Fixed 2.10.5.(bundled)

The default version that comes with ScalaIDE (currently 2.11.6 ) is automatically added to the project by ScalaIDE when it detects scala-maven-plugin in the pom.

I'd appreciate a feedback, if someone knows how to set the Scala library container version from Maven, while it bootstraps a new project. Where does the ScalaIDE look up the Scala version, if anywhere?

BTW Just make sure you download sources ( Project right-click > Maven > Download sources ) before stepping into Spark code in debugger.

If you want to use (IMHO the very best) Eclipse goodies (References, Type hierarchy, Call hierarchy) you'll have to build Spark yourself, so that all the sources are on your build path (as Maven Scala dependencies are not processed by EclipseIDE/JDT , even though they are, of course, on the build path).

Have fun debugging, I can tell you that it helped me tremendously to get deeper into Spark and really understand how it works :)

You could try to add the spark-assembly.jar instead.

As other have noted, the better way is to use Sbt (or Maven) to manage your dependencies. spark-core has many dependencies itself, and adding just that one jar won't be enough.

You haven't specified the master in you spark code. Since you're running it on your local machine. Replace following line

val conf = new SparkConf().setAppName("Simple Application")

with

val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")

Here "local[2]" means 2 threads will be used.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM