简体繁体中英

Does Spark on yarn deal with Data locality while launching executors

原文 2017-04-26 07:05:27 8 1 apache-spark/ hdfs/ yarn

I am considering static allocation of spark executor. Does Spark on yarn consider Data locality of raw input dataset getting used in spark application while launching executors.

If it does take care of this how it does so as spark executor are requested and allocated when spark context gets initialized. There could be a chance that multiple raw input data set getting used in the spark application which could physically reside on many different data node. we can't run executor on all those node.

I understand spark takes care of data locality while scheduling task on executor(as mentioned https://spark.apache.org/docs/latest/tuning.html#data-locality ).

1 answers

You are correct in saying that

spark takes care of data locality while scheduling task on executor

When Yarn launches an executor, it has no idea where your data is. So,in an ideal case, you launch executor on all nodes of your cluster. However, more realistically you launch then on only a subset of nodes.

Now, this is not necessarily a bad thing because HDFS inherently supports redundancy which means chances are there is a copy of the data present on the node that spark has requested the data on.

How does Spark prepare executors on Hadoop YARN?

Does Spark use data locality?

Spark mechanism of launching executors

How does spark choose nodes to run executors?(spark on yarn)

How YARN knows data locality in Apache spark in cluster mode

Does executors in spark nd application master in yarn do the same job?

Does spark on mesos support data locality?

spark + hadoop data locality

Data Locality in Spark on Kubernetes

Data locality in Spark Streaming

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How does Spark prepare executors on Hadoop YARN? Does Spark use data locality? Spark mechanism of launching executors How does spark choose nodes to run executors?(spark on yarn) How YARN knows data locality in Apache spark in cluster mode Does executors in spark nd application master in yarn do the same job? Does spark on mesos support data locality? spark + hadoop data locality Data Locality in Spark on Kubernetes Data locality in Spark Streaming

Related Tags

Does Spark on yarn deal with Data locality while launching executors

Question

1 answers

solution1 0 2017-09-21 16:11:40

solution1
0 2017-09-21 16:11:40