简体   繁体   中英

How to deploy scala files used in spark-shell on cluster?

I'm using the spark-shell for learning purpose and for that I created several scala files containing frequently used code, like class definitions. I use the files by calling the ":load" command within the shell. Now I would like to to use the spark-shell in in yarn-cluster mode. I start it using spark-shell --master yarn --deploy-mode client . the shell starts without any issues but when I try to run the code loaded by ":load", I get execution errors.

 17/05/04 07:59:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e68_1493271022021_0168_01_000002 on host: xxxw03.mine.de. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e68_1493271022021_0168_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
        at org.apache.hadoop.util.Shell.run(Shell.java:844)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I think I will have to share the code loaded in the shell to the workers. But how do I have to do this?

The spark-shell is useful for quickly testing but once you have an idea of what you want to do and put together a complete program it's usefulness plummets.

You probably want to now move on to using the spark-submit command. See the docs on submitting an application https://spark.apache.org/docs/latest/submitting-applications.html

Using this command you provide a JAR file instead of individual class files.

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \

<main-class> is the Java style path to your class eg com.example.MyMainClass <application-jar> is the path to the JAR file containing the classes in your project and the other params are as per documented on the link I included above but these two are the two key differences in terms of how you supply your code to the cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM