简体   繁体   English

如何在集群模式下将Spark应用提交到YARN?

[英]How to submit Spark application to YARN in cluster mode?

I have created a Spark WordCount application which I ran using spark-submit command in the shell in local mode. 我创建了一个Spark WordCount应用程序,该应用程序在本地模式下使用Shell中的spark-submit命令运行。

When I try to run it in cluster mode on YARN using the command: 当我尝试使用以下命令在YARN上以cluster模式运行它时:

spark-submit --class com.WordCount --master yarn --deploy-mode cluster WordCount-1.0.jar

It does not seem to be running and shows status as: 它似乎没有在运行,并且状态显示为:

Application report for application_1480577073003_0019 (state: ACCEPTED) application_1480577073003_0019的申请报告(状态:已接受)

How to spark-submit the Spark application to YARN in cluster mode? 如何以集群模式将Spark应用程序spark-submit到YARN?

The reason for this issue is your application/driver is requesting more resources than the available resources in the cluster at that time. 此问题的原因是您的应用程序/驱动程序正在请求的资源比当时群集中的可用资源更多。

Since you haven't specified any resource parameters, your driver will request resources with default values. 由于您尚未指定任何资源参数,因此驱动程序将请求具有默认值的资源。 That means your cluster is not able to provide the resources. 这意味着您的集群无法提供资源。

Possible reasons: 可能的原因:

  1. Your cluster doesn't have executors with enough memory/cores (default 1GB,1core) 您的集群的执行程序没有足够的内存/内核(默认为1GB,1core)
  2. Your cluster has executors with enough memory/cores but they are assigned to some other jobs. 您的集群中的执行程序具有足够的内存/核心,但它们已分配给其他一些作业。

Solution: 解:

  1. Either reduce default values of executor memory/cores request or increase memory/cores per yarn-container 降低执行器内存/芯线请求的默认值,或者增加每个纱线容器的内存/芯线
  2. Increase the cluster resources by adding more executors or wait for other jobs to complete[Or kill them if you don't like those jobs ;)] 通过添加更多的执行程序来增加集群资源,或者等待其他工作完成[或者,如果您不喜欢这些工作,则将其杀死;)]

After you spark-submit --deploy-mode cluster your Spark application, the driver and the executors are on the cluster's nodes. 在Spark spark-submit --deploy-mode cluster Spark应用程序之后,驱动程序和执行程序位于群集的节点上。

From Spark's official documentation : Spark的官方文档中

Deploy mode Distinguishes where the driver process runs. 部署模式区分驱动程序进程的运行位置。 In "cluster" mode, the framework launches the driver inside of the cluster. 在“集群”模式下,框架在集群内部启动驱动程序。 In "client" mode, the submitter launches the driver outside of the cluster. 在“客户端”模式下,提交者在群集外部启动驱动程序。

You'll get the application id being the handle to your application. 您将获得应用程序ID作为应用程序的句柄。

You should use yarn application -status command to check the status of a Spark application. 您应该使用yarn application -status命令检查Spark应用程序的状态。

-status Prints the status of the application. -status打印应用程序的状态。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM