简体   繁体   中英

How to trigger a spark job without using “spark-submit”? real-time instead of batch

I have a spark job, which I normally run with spark-submit with the input file name as the argument. Now I want to make the job available for the team, so people can submit an input file (probably through some web-API), then the spark job will be trigger, and it will return user the result file (probably also through web-API). (I am using Java/Scala)

What do I need to build in order to trigger the spark job in such scenario? Is there some tutorial somewhere? Should I use spark-streaming for such case? Thanks!

One way to go is have a web server listening for jobs, and each web request potentially triggering an execution of a spark-submit .

You can execute this using Java's ProcessBuilder .

To the best of my knowledge, there is no good way of invoking spark jobs other than through spark-submit .

You can use Livy. Livy is an open source REST interface for using Spark from anywhere.

Livy is a new open source Spark REST Server for submitting and interacting with your Spark jobs from anywhere. Livy is conceptually based on the incredibly popular IPython/Jupyter, but implemented to better integrate into the Hadoop ecosystem with multi users. Spark can now be offered as a service to anyone in a simple way: Spark shells in Python or Scala can be ran by Livy in the cluster while the end user is manipulating them at his own convenience through a REST api. Regular non-interactive applications can also be submitted. The output of the jobs can be introspected and returned in a tabular format, which makes it visualizable in charts. Livy can point to a unique Spark cluster and create several contexts by users. With YARN impersonation, jobs will be executed with the actual permissions of the users submitting them.

Please check this url for info. https://github.com/cloudera/livy

You can use SparkLauncher class to do this. You will need to have a REST API that will take file from the user and after that trigger the spark job using SparkLauncher .

Process spark = new SparkLauncher()
      .setAppResource(job.getJarPath())
      .setMainClass(job.getMainClass())
      .setMaster("master spark://"+this.serverHost + ":" + this.port)
      .launch();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM