简体   繁体   English

具有HA功能的Hadoop Yarn集群上的Apache Spark部署

[英]Apache Spark deployment on Hadoop Yarn Cluster with HA Capability

Am new to the Big data environment and just started with installing a 3 Node Hadoop cluster 2.6 with HA Capability using Zookeeper. 这是大数据环境的新手,刚开始使用Zookeeper安装具有HA功能的3节点Hadoop集群2.6。

All works good for now and i have tested the Failover scenario using zookeeper on NN1 and NN2 and works well. 一切现在都工作良好,我已经在NN1和NN2上使用zookeeper测试了故障转移方案,并且运行良好。

Now i was thinking to install Apache Spark on my Hadoop Yarn cluster also with HA Capability. 现在,我正在考虑在具有HA功能的Hadoop Yarn群集上安装Apache Spark。

Can anyone guide me with the installation steps ? 有人可以指导我进行安装步骤吗? I could only find on how to setup Spark on Stand alone mode and which i have setup successfully. 我只能找到如何在独立模式下设置Spark以及我已成功设置的模式。 Now i want to install the same in Yarn cluster along with HA Capability , 现在,我想将其与HA功能一起安装在Yarn群集中,

I have three node cluster (NN1 , NN2 , DN1) , the following daemons are currently running on each of these servers , 我有三个节点集群(NN1,NN2,DN1),以下守护程序当前正在这些服务器中的每一个上运行,

Nodes running in Master NameNode (NN1)
Jps 
DataNode    
DFSZKFailoverController 
JournalNode 
ResourceManager 
NameNode    
QuorumPeerMain  
NodeManager 

Nodes running in StandBy NameNode (NN2)
Jps 
DFSZKFailoverController 
NameNode    
QuorumPeerMain  
NodeManager 
JournalNode 
DataNode    

Nodes running in DataNode (DN1)

QuorumPeerMain  
Jps 
DataNode    
JournalNode 
NodeManager 

You should setup ResourceManager HA ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html ). 您应该设置ResourceManager HA( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html )。 Spark when run on YARN doesn't run its own daemon processes, so there is no spark part that requires HA in YARN mode. 在YARN上运行时,Spark不会运行其自己的守护进程,因此不存在需要YARN模式下的HA的spark部件。

You can configure the Spark Yarn mode, In Yarn mode you can configure the Driver and Executors Depends on the Cluster capacity. 您可以配置Spark Yarn模式,在Yarn模式下,可以根据群集容量配置驱动程序和执行程序。

spark.executor.memory <value>

Number of executors are allocated based on your YARN Container memory! 执行器的数量根据您的YARN容器内存分配!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM