[英]How to get application Id/Job Id of job submitted to Spark cluster using Spark-submit command ?
I am submitting Apache Spark job using spark-submit command.我正在使用 spark-submit 命令提交 Apache Spark 作业。 I want to retrieve application Id or Job Id of the job submitted using spark-submit command.
我想检索使用 spark-submit 命令提交的作业的应用程序 ID 或作业 ID。 What should be the recommended way?
推荐的方式应该是什么?
Output of spark-submit command can be parsed to get the application id.可以解析 spark-submit 命令的输出以获取应用程序 ID。 This is the line you should be looking at -
这是你应该看的线 -
2018-09-08 12:01:22 INFO StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20180908120122-0001
2018-09-08 12:01:22 INFO StandaloneSchedulerBackend:54 - 使用应用程序 ID app-20180908120122-0001 连接到 Spark 集群
appId=`./bin/spark-submit <options> 2>&1 | tee /dev/tty | grep -i "Connected to Spark Cluster" | grep -o app-.*[0-9]`
echo $appId
app-20180908120122-0001
Your use case is not clear but if you are looking for application id after job is completed then this could be helpful.您的用例不清楚,但如果您在作业完成后查找应用程序 ID,那么这可能会有所帮助。 This line may be different for yarn and other clusters.
对于纱线和其他簇,这条线可能不同。
Since it's not clear if you want it programatically in the app, i'll assume you do, You can get the yarn application id or job id (in local mode) with the following,由于不清楚您是否希望在应用程序中以编程方式使用它,我假设您这样做,您可以通过以下方式获取纱线应用程序 ID 或作业 ID(在本地模式下),
val sparkSession: SparkSession = ???
val appID:String = sparkSession.sparkContext.applicationId
Hope this answers your question.希望这能回答你的问题。
you can get Running Streaming Job By their UUID or query name您可以通过他们的 UUID 或查询名称获得正在运行的流媒体作业
Like this : sparkSession.streams.active.get(UUID)
(where UUID is Job RunId)像这样:
sparkSession.streams.active.get(UUID)
(其中 UUID 是 Job RunId)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.