简体繁体中英

Dataproc master node configuration

原文 2018-07-03 07:38:09 0 1 apache-spark/ google-cloud-dataproc

I am wondering how good should be the master node for spark.(machine type) I have seen people talking about worker nodes and executor cores/instances, but couldn't find any advice for master node. I am running the applications in cluster mode. Any advice?

1 answers

It actually depends on the cluster size. The nanemode keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept.

So if you have a large cluster you need to use a master with more memory.

For example if you have around 500 i3.8xlarge machines in a cluster you could have i3.8xlarge box as the master. However if you have around 1000+ such boxes you really need to use R4 memory optimize master node.

If you have a relatively small cluster the master node really doesn't matter. If you are running spark job with cluster mode , spark driver will start from any of the core node rather the master node. So as far as spark is concerned the master node doesn't really matter. However for managing large cluster master node needs to be bigger.

Change java version in master node of dataproc

How to enable pyspark HIVE support on Google Dataproc master node

where to find dataproc master node address and set setMaster() in pyspark job?

Dataproc ignoring Spark configuration

Dataproc how to run a initialization-actions script only on master node and skip running on worker nodes

Tachyon on Dataproc Master Replication Error

Connecting to remote Dataproc master in SparkSession

Google Dataproc node idle

What is a "processing node" on Dataproc?

Google Cloud Dataproc configuration issues

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Change java version in master node of dataproc How to enable pyspark HIVE support on Google Dataproc master node where to find dataproc master node address and set setMaster() in pyspark job? Dataproc ignoring Spark configuration Dataproc how to run a initialization-actions script only on master node and skip running on worker nodes Tachyon on Dataproc Master Replication Error Connecting to remote Dataproc master in SparkSession Google Dataproc node idle What is a "processing node" on Dataproc? Google Cloud Dataproc configuration issues

Related Tags

Dataproc master node configuration

Question

1 answers

solution1 0 ACCPTED 2018-07-03 08:23:59

solution1
0 ACCPTED 2018-07-03 08:23:59