简体   繁体   English

在 Spark Java 中,如何以编程方式确定 *active* 内核和任务的数量?

[英]In Spark Java how can I programmatically determine the number of *active* cores and tasks?

The Spark web UI displays great information about the total and active number of cores and tasks. Spark Web UI 显示有关内核和任务总数和活动数量的重要信息。 How can I get this information programmatically in Java Spark so that I can display job progress to end users?如何在 Java Spark 中以编程方式获取此信息,以便向最终用户显示作业进度?

I did read about the "append /json/" trick to extract JSON versions of web UI pages from the master, and I can get the total number of cores that way...我确实读过关于从 master 中提取 JSON 版本的 web UI 页面的“append /json/”技巧,我可以通过这种方式获得内核总数......

But all the information about active cores and tasks seems to be in the driver UI pages.但是有关活动内核和任务的所有信息似乎都在驱动程序 UI 页面中。 I tried the "/json/" trick on the driver UI pages and it just redirects me back to the HTML pages.我在驱动程序 UI 页面上尝试了“/json/”技巧,它只是将我重定向回 HTML 页面。

Looks like we have discovered two different ways to reveal this information:看起来我们发现了两种不同的方式来揭示这些信息:

1) Retrieve the SparkStatusTracker from the SparkContext (thank you Sai): 1)从 SparkContext 中检索 SparkStatusTracker(谢谢 Sai):

JavaSparkContext javaSparkContext = ...;
JavaSparkStatusTracker javaSparkStatusTracker = javaSparkContext.statusTracker();
for (int stageId : javaSparkStatusTracker.getActiveStageIds()) {
  SparkStageInfo sparkStageInfo = javaSparkStatusTracker.getStageInfo(stageId);
  int numTasks = sparkStageInfo.numTasks();
  int numActiveTasks = sparkStageInfo.numActiveTasks();
  int numFailedTasks = sparkStageInfo.numFailedTasks();
  int numCompletedTasks = sparkStageInfo.numCompletedTasks();
  ...
}

2) Consult the REST API available from the driver JVM: 2) 查阅驱动程序 JVM 提供的 REST API:

https://spark.apache.org/docs/latest/monitoring.html#rest-api https://spark.apache.org/docs/latest/monitoring.html#rest-api

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM