How to deploy scala files used in spark-shell on cluster?

Question

I'm using the spark-shell for learning purpose and for that I created several scala files containing frequently used code, like class definitions. I use the files by calling the ":load" command within the shell. Now I would like to to use the spark-shell in in yarn-cluster mode. I start it using spark-shell --master yarn --deploy-mode client . the shell starts without any issues but when I try to run the code loaded by ":load", I get execution errors.

 17/05/04 07:59:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e68_1493271022021_0168_01_000002 on host: xxxw03.mine.de. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e68_1493271022021_0168_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
        at org.apache.hadoop.util.Shell.run(Shell.java:844)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I think I will have to share the code loaded in the shell to the workers. But how do I have to do this?

Answer 1

The spark-shell is useful for quickly testing but once you have an idea of what you want to do and put together a complete program it's usefulness plummets.

You probably want to now move on to using the spark-submit command. See the docs on submitting an application https://spark.apache.org/docs/latest/submitting-applications.html

Using this command you provide a JAR file instead of individual class files.

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

<main-class> is the Java style path to your class eg com.example.MyMainClass <application-jar> is the path to the JAR file containing the classes in your project and the other params are as per documented on the link I included above but these two are the two key differences in terms of how you supply your code to the cluster.

How to deploy scala files used in spark-shell on cluster?

Question

1 answers

solution1
0 ACCPTED 2017-05-04 12:16:45

How to deploy scala files used in spark-shell on cluster?

Question

1 answers

solution1 0 ACCPTED 2017-05-04 12:16:45

solution1
0 ACCPTED 2017-05-04 12:16:45