简体繁体中英

and yarn in spark

原文 2016-03-28 08:14:39 3 1 apache-spark/ hdfs/ yarn

在集群模式下使用hdfs：//和yarn在spark中保存和加载保存文件有什么区别？

1 answers

From your question here , I apparently guess your understanding on HDFS and YARN is incorrect.

YARN is a generic job scheduling framework and HDFS is a storage framework.

YARN in a nut shell has a master(Resource Manager) and workers(Node manager),

The resource manager creates containers on workers to execute MapReduce jobs, spark jobs etc.

HDFS on the other hand has a master(Name Node) and worker(Data Node) to persist and retrieve files.

You don't need YARN to communicate with HDFS, it is an independent entity.

In production environment HDFS worker(Data node) and YARN worker(Node manager) are installed in a single machine so that the processing framework can consume the data from the nearest local data node(Data Locality).

Using spark on a YARN cluster in cluster mode means one of the worker nodes within the YARN cluster acts as client to submit the spark job.

Hence using hdfs:// would obviously benefit the spark job as the spark executor would read the data from the nearest data node.

The YARN and HDFS configurations would be read from HADOOP_CONF_DIR on the client machine(can be you local machine in client mode and one of the worker nodes in cluster mode).

Difference between Spark RDDs and HDFS' data blocks

Spark-submit / spark-shell > difference between yarn-client and yarn-cluster mode

What is the difference between Spark Standalone, YARN and local mode?

HDFS Write Issue in Kerberos in Spark YARN Application

Spark/Yarn: File does not exist on HDFS

HDFS Path for Spark Submit and Flink on YARN

Lost Executor trying to load Graph using Spark/GraphX in Yarn/hdfs Cluster

What is the advantage of using spark with HDFS as file storage system and YARN as resource manager?

what is the relationship between spark executor and yarn container when using spark on yarn

Difference between “spark.yarn.executor.memoryOverhead” and “spark.memory.offHeap.size”

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Difference between Spark RDDs and HDFS' data blocks Spark-submit / spark-shell > difference between yarn-client and yarn-cluster mode What is the difference between Spark Standalone, YARN and local mode? HDFS Write Issue in Kerberos in Spark YARN Application Spark/Yarn: File does not exist on HDFS HDFS Path for Spark Submit and Flink on YARN Lost Executor trying to load Graph using Spark/GraphX in Yarn/hdfs Cluster What is the advantage of using spark with HDFS as file storage system and YARN as resource manager? what is the relationship between spark executor and yarn container when using spark on yarn Difference between “spark.yarn.executor.memoryOverhead” and “spark.memory.offHeap.size”

Related Tags

Difference between using hdfs:// and yarn in spark

Question

1 answers

solution1 2 ACCPTED 2016-03-28 12:55:51

solution1
2 ACCPTED 2016-03-28 12:55:51