简体   繁体   English

只有一个火花提交允许在火花纱群集环境中运行

[英]Only one spark-submit allowed to run in spark-yarn cluster environment

I set up a spark-yarn cluster environment, the Spark(2.2.0) is in Windows 7; 我建立了一个Spark-yarn集群环境,Spark(2.2.0)在Windows 7中; yarn cluster is hadoop 2.7.3. 纱线簇是hadoop 2.7.3。

I run "spark-shell" to use SparkSQL: 我运行“spark-shell”来使用SparkSQL:

spark-shell --master yarn --deploy-mode client --conf spark.yarn.archive=hdfs://hadoop_273_namenode_ip:namenode_port/spark-archive.zip

Everything is OK by now, but when I start another "spark-shell", the message below seems never ends output to the console: 现在一切都还可以,但是当我启动另一个“spark-shell”时,下面的消息似乎永远不会结束输出到控制台:

17/10/17 17:33:53 INFO Client: Application report for application_1508232101640_0003 (state: ACCEPTED) 

The application status in the ResourceManager web UI shows ResourceManager Web UI中的应用程序状态显示

[application status] ACCEPTED: waiting for AM container to be allocated, launched and register with RM

If I close the first the "spark-shell", the second one get well to work. 如果我关闭第一个“火花壳”,第二个就能很好地工作。

It seems that it does not allow multiple spark-shell(spark-submit) at the same time (in my environment). 它似乎不允许同时(在我的环境中)多个spark-shell(spark-submit)。

How to break the limitation? 如何突破限制?

waiting for AM container to be allocated 等待分配AM容器

It's a resouce limitation, so you could make your first jb consume less resources. 这是一种资源限制,所以你可以让你的第一个jb消耗更少的资源。

What happens is that the first job consumes all available resources, and by the time the second job comes around, nothing is been free'd, thus the second job has to wait for resources to become available. 会发生的是,第一个作业消耗所有可用资源,到第二个作业到来时,没有任何内容被释放,因此第二个作业必须等待资源可用。

That's why, when you close the first shell, the other one will launch. 这就是为什么当你关闭第一个shell时,另一个将启动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM