Similar to Getting app run id for a Spark job , except from the command line or a script.
I am running spark-submit
automatically from our continuous deployment system, and I need to track the application ID so that I can kill it before running the job again (and various other needs).
Specifically, this is a Python script that executes the job on a YARN cluster, and can read standard output from spark-submit
, which we need to save the application ID for a later time.
The best plan I can figure so far is to run spark-submit
, watch standard output and extract the application ID, then detach from the process. This method is not ideal in my opinion.
Preferably, spark-submit
would (only) print out the application ID, then fork, and so far I don't see any way of doing this apart from modifying Spark itself.
Is there a nicer, more obvious way of doing this?
I've created a wrapper script that extracts the application ID for you. It's hosted at: https://github.com/gak/spark-submit-app-id-wrapper
Example:
# pip install spark-submit-app-id-wrapper
# ssaiw spark-submit --master yarn-cluster --class etc etc > /dev/null
application_1448925599375_0050
Now the CI script can run spark-submit
via ssaiw
and grab the application id as soon as possible.
Note that it has only been tested with YARN.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.