简体   繁体   English

YARN REST API - Spark作业提交

[英]YARN REST API - Spark job submission

I am trying to use the YARN REST API to submit the spark-submit jobs, which I generally run via command line. 我正在尝试使用YARN REST API来提交spark-submit作业,我通常通过命令行运行。

My command line spark-submit looks like this 我的命令行spark-submit看起来像这样

JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf /usr/local/spark-1.5/bin/spark-submit \
--driver-class-path "/etc/hadoop/conf" \
--class MySparkJob \
--master yarn-cluster \
--conf "spark.executor.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
--conf "spark.driver.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
spark-job.jar --retry false --counter 10

Reading through the YARN REST API documentation https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application , I tried to create the JSON payload to POST which looks like 阅读YARN REST API文档https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application ,我试图创建一个JSON有效负载到POST,看起来像

{
  "am-container-spec": {
    "commands": {
      "command": "JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster  --jar spark-job.jar --class MySparkJob --arg --retry --arg false --arg --counter --arg 10"
    }, 
    "local-resources": {
      "entry": [
        {
          "key": "spark-job.jar", 
          "value": {
            "resource": "hdfs:///spark-job.jar", 
            "size": 3214567, 
            "timestamp": 1452408423000, 
            "type": "FILE", 
            "visibility": "APPLICATION"
          }
        }
      ]
    }
  }, 
  "application-id": "application_11111111111111_0001", 
  "application-name": "test",
  "application-type": "Spark"   
}

The problem I see is that, the hadoop configs directory is previously local to the machine I was running jobs from, now that I submit job via REST API and it runs directly on the RM, I am not sure how to provide these details ? 我看到的问题是,hadoop configs目录以前是我运行作业的机器本地的,现在我通过REST API提交作业,它直接在RM上运行,我不知道如何提供这些细节?

If you are trying to submit spark job via REST APIs, I will suggest to have a look at Livy . 如果您尝试通过REST API提交spark作业,我建议您查看Livy Its a simple and easiest way to submit spark jobs to cluster. 这是向群集提交火花作业的简单方法。

Livy is an open source REST interface for interacting with Apache Spark from anywhere. Livy是一个开源的REST接口,可以从任何地方与Apache Spark进行交互。 It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN. 它支持在本地或Apache Hadoop YARN中运行的Spark上下文中执行代码或程序的片段。

  • Interactive Scala, Python and R shells 交互式Scala,Python和R shell
  • Batch submissions in Scala, Java, Python Scala,Java,Python中的批量提交
  • Multiple users can share the same server (impersonation support) 多个用户可以共享同一个服务器(模拟支持)
  • Can be used for submitting jobs from anywhere with REST 可用于从任何地方使用REST提交作业
  • Does not require any code change to your programs 不需要对程序进行任何代码更改

We've also tried submitting application through Java RMI option. 我们还尝试通过Java RMI选项提交应用程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM