简体   繁体   中英

Apache Spark - Unable to understand scala example

I am trying to understand the scala code on this location. (I am from java background).

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala

I am feeling totally lost in the below part

val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
  val ranGen = new Random
  var arr1 = new Array[(Int, Array[Byte])](numKVPairs)
  for (i <- 0 until numKVPairs) {
    val byteArr = new Array[Byte](valSize)
    ranGen.nextBytes(byteArr)
    arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
  }
  arr1
}.cache()

I know what parallelize and flat map does . I am not getting how arr1 is getting initialized . Is it of type int or something else- array of bytes ? Also, What its doing inside for loop logic.

var arr1 = new Array[(Int, Array[Byte])](numKVPairs)

simply creates an array of size numKVPairs and of type (Int, Array[Byte]) (pair of int and array of bytes)

Afterwards, arr1 is filled with random data.

var arr1 = new Array[(Int, Array[Byte])](numKVPairs)

creates an array of pairs of type (Int, Array[Byte]). That is the first element of pair is of type Int and the second of type Array[Byte].

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM