简体   繁体   中英

Yarn as resource manager in SPARK for linux cluster - inside Kubernetes and outside Kubernetes

If I am using Kubernetes cluster to run spark, then I am using Kubernetes resource manager in Spark.

If I am using Hadoop cluster to run spark, then I am using Yarn resource manager in Spark.

But my question is, if I am spawning multiple linux nodes in kebernetes, and use one of the node as spark maste and three other as worker, what resource manager should I use? can I use yarn over here?

Second question, in case of any 4 node linux spark cluster (not in kubernetes and not hadoop, simple connected linux machines), even if I do not have hdfs, can I use yarn here as resource manager? if not, then what resource manager should be used for saprk?

Thanks.

if I am spawning multiple linux nodes in kebernetes,

Then you'd obviously use kubernetes, since it's available

in case of any 4 node linux spark cluster (not in kubernetes and not hadoop, simple connected linux machines), even if I do not have hdfs, can I use yarn here

You can, or you can use Spark Standalone scheduler, instead. However Spark requires a shared filesystem for reading and writing data, so, while you could attempt to use NFS, or S3/GCS for this, HDFS is faster

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM