简体   繁体   English

如何在Apache Spark Java应用程序中为每个Apache Spark执行器分配唯一的整数键?

[英]How can I assign a unique integer key to every Apache Spark Executor within an Apache Spark Java Application?

I need to assign a unique integer id to each spark executor in a spark application. 我需要为spark应用程序中的每个spark执行器分配一个唯一的整数ID。 I need to retrieve the executor id from within a task running on an executor. 我需要从在执行程序上运行的任务中检索执行程序ID。 The executor id will be used, along with other data elements (timestamp, mac address, etc), to generate unique 64 bit keys. 执行程序ID将与其他数据元素(时间戳,mac地址等)一起用于生成唯一的64位密钥。 How can I assign a unique integer key to every Apache Spark Executor within an Apache Spark Java Application? 如何在Apache Spark Java应用程序中为每个Apache Spark执行器分配唯一的整数键?

The id of the partition might be useful, as all elements of a single partition will always be on one executor. 分区的ID可能很有用,因为单个分区的所有元素将始终位于一个执行程序上。

mapPartitionsWithIndex can help: mapPartitionsWithIndex可以帮助:

val spark = SparkSession.builder.master("local[*]").appName("partitionIndex").getOrCreate()
import spark.implicits._

val ds = spark.createDataset(Seq.range(1, 21)).repartition(4)
ds.rdd
  .mapPartitionsWithIndex((partitionIndex, it) => {
    println("processing partition " + partitionIndex)
    it.toList.map(i => new String("partition " + partitionIndex + " contains number " + i)).iterator
  })
  .foreach(println)

prints: 打印:

processing partition 1
processing partition 0
processing partition 2
processing partition 3
partition 1 contains number 3
partition 2 contains number 4
partition 2 contains number 9
partition 2 contains number 14
partition 2 contains number 19
partition 0 contains number 2
...
partition 3 contains number 1
partition 3 contains number 5
...

If you are able to assign all rows within one partition a unique id, then the combination of this unique id and the partition index will be unique in the whole system. 如果您能够为一个分区内的所有行分配唯一的ID,则此唯一ID和分区索引的组合在整个系统中将是唯一的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM