Apache Spark-无法理解Scala示例

Question

I am trying to understand the scala code on this location. 我正在尝试了解此位置上的Scala代码。 (I am from java background). （我来自Java背景）。

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala

I am feeling totally lost in the below part 我感到在下面的部分完全迷路了

val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
  val ranGen = new Random
  var arr1 = new Array[(Int, Array[Byte])](numKVPairs)
  for (i <- 0 until numKVPairs) {
    val byteArr = new Array[Byte](valSize)
    ranGen.nextBytes(byteArr)
    arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
  }
  arr1
}.cache()

I know what parallelize and flat map does . 我知道并行化和平面映射的作用。 I am not getting how arr1 is getting initialized . 我不知道如何对arr1进行初始化。 Is it of type int or something else- array of bytes ? 是int类型还是其他类型的字节数组？ Also, What its doing inside for loop logic. 另外，它在循环逻辑内部做了什么。

Answer 1

var arr1 = new Array[(Int, Array[Byte])](numKVPairs)

simply creates an array of size numKVPairs and of type (Int, Array[Byte]) (pair of int and array of bytes) 只需创建一个大小为numKVPairs且类型为(Int, Array[Byte])数组（一对int和字节数组）

Afterwards, arr1 is filled with random data. 之后， arr1会填充随机数据。

Answer 2

var arr1 = new Array[(Int, Array[Byte])](numKVPairs)

creates an array of pairs of type (Int, Array[Byte]). 创建一个成对的数组类型（Int，Array [Byte]）。 That is the first element of pair is of type Int and the second of type Array[Byte]. 那是对的第一个元素是Int类型，第二个元素是Array [Byte]。

Apache Spark-无法理解Scala示例

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-01-27 20:02:29

解决方案2
1 2016-01-28 04:03:11

Apache Spark-无法理解Scala示例

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-01-27 20:02:29

解决方案2 1 2016-01-28 04:03:11

解决方案1
1 已采纳 2016-01-27 20:02:29

解决方案2
1 2016-01-28 04:03:11