I understand the major differences between client and cluster mode for Spark applications on YARN.
Major differences include
In both cases for similarities
My question is - In real world scenarios( production environment), where we do not need interactive mode, client not requiring to run for long duration - is the cluster mode an obvious choice?
Are there any benefits for client mode like:
From the documentation,
A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (eg Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within the client spark-submit process, with the input and output of the application attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (eg Spark shell).
Alternatively, if your application is submitted from a machine far from the worker machines (eg locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications.
Looks like, the main reason is when we run the spark-submit from remote, to reduce the latency between executors and driver, cluster mode is preferred.
From my experience, in production environment the only resonable mode is cluster-mode with 2 exceptions:
ssh
to server that is not accessible from hadoop nodes ssc.stop(stopGracefully = true)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.