简体   繁体   English

连接CDH 5.4中的纱线簇上的火花

[英]Connect spark on yarn-cluster in CDH 5.4

I am trying to understand the "concept" of connecting to a remote server. 我试图了解连接到远程服务器的“概念”。 What I have are 4 servers on CentOS using CDH5.4 What I want to do is connect spark on yarn on all these four nodes. 我在使用CDH5.4的CentOS上有4台服务器,我想要做的是在所有这四个节点上的纱线上连接spark。 My problem is I do not understand how to set HADOOP_CONF_DIR as specified here . 我的问题是我不明白如何按此处指定的方式设置HADOOP_CONF_DIR。 Where and what value should i set for this variable? 我应该在哪里为该变量设置什么值? And then do I need to set this variable on all four nodes or only the master node will suffice? 然后我是否需要在所有四个节点上设置此变量,或者仅主节点就足够了?

The documentation says "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster". 该文档说:“确保HADOOP_CONF_DIR或YARN_CONF_DIR指向包含Hadoop集群的(客户端)配置文件的目录”。 I have read many questions similar to this before asking it in here. 在这里问之前,我已经阅读了许多与此类似的问题。 Please, let me know what can I do to solve this problem. 请让我知道如何解决此问题。 I am able to run spark and pyspark on stand alone mode on all nodes. 我能够在所有节点上以独立模式运行spark和pyspark。

Thanks for your help. 谢谢你的帮助。 Ashish 阿什什

Where and what value should i set for this variable? 我应该在哪里为该变量设置什么值?

The variable HADOOP_CONF_DIR should point to the directory that contains yarn-site.xml. 变量HADOOP_CONF_DIR应该指向包含yarn-site.xml的目录。 Usually you set it in ~/.bashrc . 通常您将其设置为~/.bashrc I found documentation for CDH. 我找到了CDH的文档。 http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/ClusterSetup.html http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-common/ClusterSetup.html

Basically all nodes need to have configuration files pointed by the environment variable. 基本上,所有节点都需要具有环境变量指向的配置文件。

Once all the necessary configuration is complete, distribute the files to the HADOOP_CONF_DIR directory on all the machines 完成所有必要的配置后,将文件分发到所有计算机上的HADOOP_CONF_DIR目录中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 纱线群集模式下Spark作业的ClassNotFoundException - ClassNotFoundException for Spark job on Yarn-cluster mode Spark在纱簇上提交-Hive错误 - Spark submit on yarn-cluster - Hive error 使用--jars的spark-submit yarn-cluster不起作用? - spark-submit yarn-cluster with --jars does not work? Spark yarn-cluster模式-读取通过--files传递的文件 - Spark yarn-cluster mode - read file passed with --files 使用Spark-SQL通过yarn-cluster的坏hdfs权限 - Bad hdfs permissions with Spark-SQL over yarn-cluster 有关故障转移过程如何在纱线群集模式下为Spark驱动程序(及其YARN容器)工作的资源/文档 - Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode 使用--master yarn-cluster运行spark-submit:spark-assembly问题 - Running spark-submit with --master yarn-cluster: issue with spark-assembly 如何使用电火花在纱线簇模式下将原木直接打印到控制台上 - How logs printed directly onto the console in yarn-cluster mode using spark 在 yarn-cluster 模式下运行 Spark 时出错(应用程序返回退出代码 1) - Error (application returned with exitcode 1) when running Spark in yarn-cluster mode 使用MASTER = yarn-cluster运行HiveFromSpark示例 - Run HiveFromSpark example with MASTER=yarn-cluster
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM