简体   繁体   中英

YARN REST API - Spark job submission

I am trying to use the YARN REST API to submit the spark-submit jobs, which I generally run via command line.

My command line spark-submit looks like this

JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf /usr/local/spark-1.5/bin/spark-submit \
--driver-class-path "/etc/hadoop/conf" \
--class MySparkJob \
--master yarn-cluster \
--conf "spark.executor.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
--conf "spark.driver.extraClassPath=/usr/local/hadoop/client/hadoop-*" \
spark-job.jar --retry false --counter 10

Reading through the YARN REST API documentation https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application , I tried to create the JSON payload to POST which looks like

{
  "am-container-spec": {
    "commands": {
      "command": "JAVA_HOME=/usr/local/java7/ HADOOP_CONF_DIR=/etc/hadoop/conf org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster  --jar spark-job.jar --class MySparkJob --arg --retry --arg false --arg --counter --arg 10"
    }, 
    "local-resources": {
      "entry": [
        {
          "key": "spark-job.jar", 
          "value": {
            "resource": "hdfs:///spark-job.jar", 
            "size": 3214567, 
            "timestamp": 1452408423000, 
            "type": "FILE", 
            "visibility": "APPLICATION"
          }
        }
      ]
    }
  }, 
  "application-id": "application_11111111111111_0001", 
  "application-name": "test",
  "application-type": "Spark"   
}

The problem I see is that, the hadoop configs directory is previously local to the machine I was running jobs from, now that I submit job via REST API and it runs directly on the RM, I am not sure how to provide these details ?

If you are trying to submit spark job via REST APIs, I will suggest to have a look at Livy . Its a simple and easiest way to submit spark jobs to cluster.

Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

  • Interactive Scala, Python and R shells
  • Batch submissions in Scala, Java, Python
  • Multiple users can share the same server (impersonation support)
  • Can be used for submitting jobs from anywhere with REST
  • Does not require any code change to your programs

We've also tried submitting application through Java RMI option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM