简体   繁体   中英

Apache Spark deployment on Hadoop Yarn Cluster with HA Capability

Am new to the Big data environment and just started with installing a 3 Node Hadoop cluster 2.6 with HA Capability using Zookeeper.

All works good for now and i have tested the Failover scenario using zookeeper on NN1 and NN2 and works well.

Now i was thinking to install Apache Spark on my Hadoop Yarn cluster also with HA Capability.

Can anyone guide me with the installation steps ? I could only find on how to setup Spark on Stand alone mode and which i have setup successfully. Now i want to install the same in Yarn cluster along with HA Capability ,

I have three node cluster (NN1 , NN2 , DN1) , the following daemons are currently running on each of these servers ,

Nodes running in Master NameNode (NN1)
Jps 
DataNode    
DFSZKFailoverController 
JournalNode 
ResourceManager 
NameNode    
QuorumPeerMain  
NodeManager 

Nodes running in StandBy NameNode (NN2)
Jps 
DFSZKFailoverController 
NameNode    
QuorumPeerMain  
NodeManager 
JournalNode 
DataNode    

Nodes running in DataNode (DN1)

QuorumPeerMain  
Jps 
DataNode    
JournalNode 
NodeManager 

You should setup ResourceManager HA ( http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html ). Spark when run on YARN doesn't run its own daemon processes, so there is no spark part that requires HA in YARN mode.

You can configure the Spark Yarn mode, In Yarn mode you can configure the Driver and Executors Depends on the Cluster capacity.

spark.executor.memory <value>

Number of executors are allocated based on your YARN Container memory!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM