简体   繁体   中英

How can I assign a unique integer key to every Apache Spark Executor within an Apache Spark Java Application?

I need to assign a unique integer id to each spark executor in a spark application. I need to retrieve the executor id from within a task running on an executor. The executor id will be used, along with other data elements (timestamp, mac address, etc), to generate unique 64 bit keys. How can I assign a unique integer key to every Apache Spark Executor within an Apache Spark Java Application?

The id of the partition might be useful, as all elements of a single partition will always be on one executor.

mapPartitionsWithIndex can help:

val spark = SparkSession.builder.master("local[*]").appName("partitionIndex").getOrCreate()
import spark.implicits._

val ds = spark.createDataset(Seq.range(1, 21)).repartition(4)
ds.rdd
  .mapPartitionsWithIndex((partitionIndex, it) => {
    println("processing partition " + partitionIndex)
    it.toList.map(i => new String("partition " + partitionIndex + " contains number " + i)).iterator
  })
  .foreach(println)

prints:

processing partition 1
processing partition 0
processing partition 2
processing partition 3
partition 1 contains number 3
partition 2 contains number 4
partition 2 contains number 9
partition 2 contains number 14
partition 2 contains number 19
partition 0 contains number 2
...
partition 3 contains number 1
partition 3 contains number 5
...

If you are able to assign all rows within one partition a unique id, then the combination of this unique id and the partition index will be unique in the whole system.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM