[英]Connect spark on yarn-cluster in CDH 5.4
I am trying to understand the "concept" of connecting to a remote server. 我试图了解连接到远程服务器的“概念”。 What I have are 4 servers on CentOS using CDH5.4 What I want to do is connect spark on yarn on all these four nodes. 我在使用CDH5.4的CentOS上有4台服务器,我想要做的是在所有这四个节点上的纱线上连接spark。 My problem is I do not understand how to set HADOOP_CONF_DIR as specified here . 我的问题是我不明白如何按此处指定的方式设置HADOOP_CONF_DIR。 Where and what value should i set for this variable? 我应该在哪里为该变量设置什么值? And then do I need to set this variable on all four nodes or only the master node will suffice? 然后我是否需要在所有四个节点上设置此变量,或者仅主节点就足够了?
The documentation says "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster". 该文档说:“确保HADOOP_CONF_DIR或YARN_CONF_DIR指向包含Hadoop集群的(客户端)配置文件的目录”。 I have read many questions similar to this before asking it in here. 在这里问之前,我已经阅读了许多与此类似的问题。 Please, let me know what can I do to solve this problem. 请让我知道如何解决此问题。 I am able to run spark and pyspark on stand alone mode on all nodes. 我能够在所有节点上以独立模式运行spark和pyspark。
Thanks for your help. 谢谢你的帮助。 Ashish 阿什什
Where and what value should i set for this variable? 我应该在哪里为该变量设置什么值?
The variable HADOOP_CONF_DIR should point to the directory that contains yarn-site.xml. 变量HADOOP_CONF_DIR应该指向包含yarn-site.xml的目录。 Usually you set it in ~/.bashrc
. 通常您将其设置为~/.bashrc
。 I found documentation for CDH. 我找到了CDH的文档。 http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/ClusterSetup.html http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/ClusterSetup.html
Basically all nodes need to have configuration files pointed by the environment variable. 基本上,所有节点都需要具有环境变量指向的配置文件。
Once all the necessary configuration is complete, distribute the files to the HADOOP_CONF_DIR directory on all the machines 完成所有必要的配置后,将文件分发到所有计算机上的HADOOP_CONF_DIR目录中
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.