繁体   English   中英

如何在Java spark中获取执行器数量和内核数量

[英]How to get number of executors and number of cores in Java spark

我是 Spark 的新手,我们现在正在用 Java 编码。 问题是我们正试图弄清楚执行器的数量和内核的数量。 我用谷歌搜索并看到一些文章提到在 Spark 中执行此操作的方法如下。 但是在 Java 中没有看到类似的东西(JavaSparkContext 没有getExecutorMemoryStatusgetExecutorStorageStatus )。 请问有人可以帮忙吗?

 // for executor numbers
 def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

  // for executor core numbers
  int(sc._conf.get('spark.executor.cores'))

我的回答主要基于这个 SO Answer 最近, getExecutorStorageStatus已从SparkContext中删除(在较新版本的 spark 中)。 因此你不能使用sc. getExecutorStorageStatus sc. getExecutorStorageStatus 相反,我们可以使用 SparkEnv 的blockManager.master.getStorageStatus.length - 1 (减一再次用于驱动程序)。 通过SparkContext env获得它的正常方法。 但是它不能在org.apache.spark包之外访问。 因此,我们使用encapsulation violation pattern为:

package org.apache.spark.util

import org.apache.spark.{SparkContext, SparkEnv}

/**
  * Below objects are not accessible outside of the org.apache.spark.util package.
  * Therefore, we use an encapsulation violation pattern.
  */
object SparkInternalUtils {

  def sparkEnv(sc: SparkContext): SparkEnv = sc.env
  def getThreadUtils: ThreadUtils.type = ThreadUtils

}

现在,我们可以使用SparkInternalUtils.sparkEnv(sc)获取SparkEnv的实例

定义 RichSparkContext 如下-


import org.apache.spark.SparkContext
import org.apache.spark.util.SparkInternalUtils

import scala.language.implicitConversions


class RichSparkContext(val sc: SparkContext) {

  def executorCount: Int =
    SparkInternalUtils.sparkEnv(sc).blockManager.master.getStorageStatus.length - 1 // one is the driver

  def coresPerExecutor: Int =
    RichSparkContext.coresPerExecutor(sc)

  def coreCount: Int =
    executorCount * coresPerExecutor

  def coreCount(coresPerExecutor: Int): Int =
    executorCount * coresPerExecutor

}


object RichSparkContext {

  trait Enrichment {
    implicit def enrichMetadata(sc: SparkContext): RichSparkContext =
      new RichSparkContext(sc)
  }

  object implicits extends Enrichment

  private var _coresPerExecutor: Int = 0

  def coresPerExecutor(sc: SparkContext): Int =
    synchronized {
      if (_coresPerExecutor == 0)
        sc.range(0, 1).map(_ => java.lang.Runtime.getRuntime.availableProcessors).collect.head
      else _coresPerExecutor
    }

}

In scala, get the number of executors & and core count

val sc = ... // SparkContext instance
    import RichSparkContext.implicits._
    val executorCount = sc.executorCount
val coresPerExecutor = sc.coresPerExecutor
val totalCoreCount = sc.coreCount

In java, get the number of executors & and core count

 JavaSparkContext javaSparkContext = new JavaSparkContext(spark.sparkContext());
        RichSparkContext richSparkContext = new RichSparkContext(javaSparkContext.sc());
        System.out.println(richSparkContext.coresPerExecutor());
        System.out.println(richSparkContext.coreCount());
        System.out.println(richSparkContext.executorCount());

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM