简体   繁体   English

在火花中使用hdfs://和yarn的区别

[英]Difference between using hdfs:// and yarn in spark

在集群模式下使用hdfs://和yarn在spark中保存和加载保存文件有什么区别?

From your question here , I apparently guess your understanding on HDFS and YARN is incorrect. 这里的问题来看,我显然猜想您对HDFS和YARN的理解不正确。

YARN is a generic job scheduling framework and HDFS is a storage framework. YARN是一个通用的作业调度框架,而HDFS是一个存储框架。

YARN in a nut shell has a master(Resource Manager) and workers(Node manager), 坚果壳中的YARN具有一个master(资源管理器)和worker(节点管理器),

The resource manager creates containers on workers to execute MapReduce jobs, spark jobs etc. 资源管理器在工作人员上创建容器以执行MapReduce作业,Spark作业等。

HDFS on the other hand has a master(Name Node) and worker(Data Node) to persist and retrieve files. 另一方面,HDFS具有一个主节点(名称节点)和辅助节点(数据节点)以持久化和检索文件。

You don't need YARN to communicate with HDFS, it is an independent entity. 您不需要YARN与HDFS进行通信,它是一个独立的实体。

In production environment HDFS worker(Data node) and YARN worker(Node manager) are installed in a single machine so that the processing framework can consume the data from the nearest local data node(Data Locality). 在生产环境中,HDFS worker(数据节点)和YARN worker(节点管理器)安装在单台计算机上,以便处理框架可以使用最近的本地数据节点(Data Locality)中的数据。

Using spark on a YARN cluster in cluster mode means one of the worker nodes within the YARN cluster acts as client to submit the spark job. 在群集模式下在YARN群集上使用spark意味着YARN群集中的工作节点之一充当客户端来提交spark作业。

Hence using hdfs:// would obviously benefit the spark job as the spark executor would read the data from the nearest data node. 因此,使用hdfs://显然将使spark作业受益,因为spark执行程序将从最近的数据节点读取数据。

The YARN and HDFS configurations would be read from HADOOP_CONF_DIR on the client machine(can be you local machine in client mode and one of the worker nodes in cluster mode). YARN和HDFS配置将从客户端计算机上的HADOOP_CONF_DIR读取(可以是客户端模式下的本地计算机,而群集模式下的工作节点之一)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark RDD和HDFS数据块之间的区别 - Difference between Spark RDDs and HDFS' data blocks Spark-submit / spark-shell > yarn-client 和 yarn-cluster 模式的区别 - Spark-submit / spark-shell > difference between yarn-client and yarn-cluster mode Spark Standalone、YARN 和本地模式有什么区别? - What is the difference between Spark Standalone, YARN and local mode? Spark YARN应用程序中的Kerberos中的HDFS写问题 - HDFS Write Issue in Kerberos in Spark YARN Application Spark / Yarn:HDFS上不存在文件 - Spark/Yarn: File does not exist on HDFS Spark提交的HDFS路径和YARN上的Flink - HDFS Path for Spark Submit and Flink on YARN 丢失的执行器尝试在Yarn / hdfs集群中使用Spark / GraphX加载图 - Lost Executor trying to load Graph using Spark/GraphX in Yarn/hdfs Cluster 使用 Spark 与 HDFS 作为文件存储系统和 YA​​RN 作为资源管理器有什么好处? - What is the advantage of using spark with HDFS as file storage system and YARN as resource manager? 在纱线上使用火花时火花执行器和纱线容器是什么关系 - what is the relationship between spark executor and yarn container when using spark on yarn “spark.yarn.executor.memoryOverhead”和“spark.memory.offHeap.size”之间的区别 - Difference between “spark.yarn.executor.memoryOverhead” and “spark.memory.offHeap.size”
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM