简体   繁体   English

如何在REST Web服务的上下文中调用spark作业?

[英]How to invoke spark job in context of REST Web-service?

I want to run Spark SQL queries in my restful web service, So How can I run Spark Context via Jersey context? 我想在我的restful web服务中运行Spark SQL查询,那么如何通过Jersey上下文运行Spark Context? I need to pass my Spark SQL request to the cluster then return the result to the user via REST API. 我需要将我的Spark SQL请求传递给集群,然后通过REST API将结果返回给用户。 But in Spark Documentations, there is no way to run Spark SQL Queries in java code without submitting the jar file to the cluster (master/slaves). 但是在Spark Documentations中,没有办法在java代码中运行Spark SQL查询而无需将jar文件提交到集群(master / slaves)。

If you are using Spark version > 1.4 then you can use SparkLauncher to run your application. 如果您使用Spark版本> 1.4,那么您可以使用SparkLauncher来运行您的应用程序。

import org.apache.spark.launcher.SparkLauncher;

public class MyLauncher {
  public static void main(String[] args) throws Exception {
    Process spark = new SparkLauncher()
      .setAppResource("/my/app.jar")
      .setMainClass("my.spark.app.Main")
      .setMaster("local")
      .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
      .launch();
    spark.waitFor();
  }
}

In order for it to work you should give it a jar file. 为了使它工作,你应该给它一个jar文件。 Since you want to run a SparkSQL query then you could pack it in a single jar file or you could have a jar getting as parameter the query you want to execute. 由于您希望运行SparkSQL查询,因此您可以将其打包在单个jar文件中,或者您可以将jar作为参数来执行您想要执行的查询。

The caveat is that you have to start-stop the SparkContext every time you want to execute the query. 需要注意的是,每次要执行查询时都必须启动SparkContext。 If you don't mind waiting for it then it is fine. 如果你不介意等待它,那就没关系了。 But if time is a big thing then I would recommend writing a separate service that would have the spark context always up and your application would make calls to it. 但是如果时间是一件大事,那么我建议你写一个单独的服务,它会使spark上下文一直在运行,你的应用程序会调用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM