简体   繁体   中英

Dataproc master node configuration

I am wondering how good should be the master node for spark.(machine type) I have seen people talking about worker nodes and executor cores/instances, but couldn't find any advice for master node. I am running the applications in cluster mode. Any advice?

It actually depends on the cluster size. The nanemode keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept.

So if you have a large cluster you need to use a master with more memory.

For example if you have around 500 i3.8xlarge machines in a cluster you could have i3.8xlarge box as the master. However if you have around 1000+ such boxes you really need to use R4 memory optimize master node.

If you have a relatively small cluster the master node really doesn't matter. If you are running spark job with cluster mode , spark driver will start from any of the core node rather the master node. So as far as spark is concerned the master node doesn't really matter. However for managing large cluster master node needs to be bigger.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM