简体繁体中英

For spark applications running on YARN, which deploy mode is better - client or cluster

原文 2016-11-02 21:29:59 6 2 hadoop/ apache-spark/ yarn

I understand the major differences between client and cluster mode for Spark applications on YARN.

Major differences include

Where do the driver run - Local in clinet mode, Application Master in cluster mode
Client running duration - In clinet mode, client needs to run for entire duration, In cluster mode, client need not run as AM takes care of it
Interactive usage - spark shell and pyspark. Cluster mode is not suited well as these require the driver to be run on client
Scheduling work - In client mode, the client schedules the work by communicating directly with the containers. In cluster mode, A schedules the work by communicating directly with the containers

In both cases for similarities

Who handles the executor requests from the YARN - Application master
Who starts the executor processes - YARN Node Manager

My question is - In real world scenarios( production environment), where we do not need interactive mode, client not requiring to run for long duration - is the cluster mode an obvious choice?

Are there any benefits for client mode like:

to run the driver on client machine rather than AM
to allow client to schedule work, rather than AM

2 answers

From the documentation,

A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (eg Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within the client spark-submit process, with the input and output of the application attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (eg Spark shell).

Alternatively, if your application is submitted from a machine far from the worker machines (eg locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications.

Looks like, the main reason is when we run the spark-submit from remote, to reduce the latency between executors and driver, cluster mode is preferred.

From my experience, in production environment the only resonable mode is cluster-mode with 2 exceptions:

when hadoop nodes does not have resources needed by application, for example: at the end of execution spark job performs ssh to server that is not accessible from hadoop nodes
when you use spark streaming and you want to shut it gracefully (killing cluster-mode application forces the streaming to close and if you run in client-mode you can call ssc.stop(stopGracefully = true)

running a spark submit job as cluster deploy mode fails but passes with client

Running spark cluster on standalone mode vs Yarn/Mesos

Running Spark on Yarn Client

which mode we should use when running spark on yarn?

InvalidResourceRequestException Yarn Exception while running Spark in Cluster mode with yarn in hadoop 2.4

Spark not able to run in yarn cluster mode

submitting PySpark app to spark on YARN in cluster mode

ClassNotFoundException for Spark job on Yarn-cluster mode

Executing Spark on yarn cluster mode AccessControlException

Can Spark streaming and Spark applications be run within the same YARN cluster?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question running a spark submit job as cluster deploy mode fails but passes with client Running spark cluster on standalone mode vs Yarn/Mesos Running Spark on Yarn Client which mode we should use when running spark on yarn? InvalidResourceRequestException Yarn Exception while running Spark in Cluster mode with yarn in hadoop 2.4 Spark not able to run in yarn cluster mode submitting PySpark app to spark on YARN in cluster mode ClassNotFoundException for Spark job on Yarn-cluster mode Executing Spark on yarn cluster mode AccessControlException Can Spark streaming and Spark applications be run within the same YARN cluster?

Related Tags

For spark applications running on YARN, which deploy mode is better - client or cluster

Question

2 answers

solution1
2 ACCPTED 2017-01-11 18:02:15

solution2
1 2016-11-02 22:30:51

For spark applications running on YARN, which deploy mode is better - client or cluster

Question

2 answers

solution1 2 ACCPTED 2017-01-11 18:02:15

solution2 1 2016-11-02 22:30:51

solution1
2 ACCPTED 2017-01-11 18:02:15

solution2
1 2016-11-02 22:30:51