简体   繁体   中英

How to execute spark-submit in Java with Scala .jar provided?

I have several Spark big data applications written in Scala . These applications have their other version written in R .

I also have a web server application written in Java . This is provided as API for the web GUI. The purpose is to enable GUI to execute these applications and choose the version: R or Spark . I managed to call the R code from the Java API and get the result to JSON . But now it seems to be quite complicated to execute the Spark programs.

Until now, I was able to merge one of the Scala .jar file with the Java API with Maven . I do this by placing my Spark program as a local repository in pom.xml so that the Scala code is included in the final .jar package. I also mentioned Scala and breeze library as dependencies in the pom.xml . And when I try to send a request with the API, of course it throws an error saying java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$ . By this point, I realize that it was because I haven't mentioned Spark library in Maven dependencies, but then I think I've been doing it wrong, since Spark applications are generally run by executing spark-submit command in terminal.

So now what I'm thinking is putting the Java API .jar and Scala .jar in one folder, and then executing spark-submit from inside of Java API .jar, targeting the Scala .jar. Is this even correct? And how to execute the spark-submit from Java code? Does it have to be using Runtime.exec() as mentioned in here ?

SparkLauncher can be used to submit the spark code(written in scala with precomplied jar scala.jar placed at certain location) from the Java Api code.

The saprk documentaion for using SparkLauncher recommends the below way to submit the spark job pro-grammatically from inside Java Applications. Add the below code in your Java Api code.

import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;

   public class MyLauncher {
     public static void main(String[] args) throws Exception {
       SparkAppHandle handle = new SparkLauncher()
         .setAppResource("/my/scala.jar")
         .setMainClass("my.spark.app.Main")
         .setMaster("local")
         .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
         .startApplication();
       // Use handle API to monitor / control application.
     }
   }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM