简体   繁体   中英

Spark 2.3.1 on YARN : how to monitor stages progress programatically?

I have a setup with Spark running on YARN, and my goal is to programmatically get updates of the progress of a Spark job by its application id.

My first idea was to parse HTML output of the YARN GUI. However, the problem of such GUI, is that the progress bar associated to a spark job don't get updated regularly and even don't change most of time : when the job start, the percent is something like 10%, and it stuck to this value until the job finish. So such YARN progress bar is just irrelevant for Spark Jobs.

When I click to the Application Master link corresponding to a Spark Job, I'm redirected to the Spark GUI that is temporarily binded during the job run. The stages page is very relevant about progress of the Spark job. However it is plain HTML, so it is a pain to parse. On the Spark documentation, they talk about a JSON API, however it seems that I can't access to it as I'm under YARN and I'm accessing Spark GUI trough YARN proxy pages.

May be a solution, in order to have access to more things, could be to access to the real Spark GUI ip:port, and not the YARN proxied one, but I don't know if I can get such source URL easily...

All of that sound complicated to just get Spark job progress... As of 2018, could you please tell us what are the preferred methods to get relevant stages progress of a Spark Job running on YARN ?

从应用程序内部,您可以使用spark.sparkContext.statusTracker获取有关舞台进度的信息,您可以查看Zeppelin Notebook如何为Spark 2.3实现进度条: https : //github.com/apache/zeppelin/blob/主/火花/火花-scala-parent / src / main / scala / org / apache / zeppelin / spark / JobProgressUtil.scala

No way of knowing the progress in percentage, as you can have any amount of Spark stages. However, there is a REST API for Spark History Server - Monitoring and Instrumentation with which you can ask for stages/tasks/jobs info. Assuming your app has predefined amount of Stages - you can calculate the progress.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM