简体   繁体   English

Spark Scala用(1和0)生成随机RDD吗?

[英]Spark Scala Generating Random RDD with (1's and 0's )?

How does one create an RDD filled with values from an array say (0,1) - filling random 1000 values as 1 and remaining 0. 如何创建一个用数组中的值填充的RDD表示(0,1)-将1000个随机值填充为1,其余为0。

I know I can filter and do this but it won't be random. 我知道我可以过滤并执行此操作,但这不会是随机的。 I want it to be as random as possible 我希望它尽可能随机

var populationMatrix = new IndexedRowMatrix(RandomRDDs.uniformVectorRDD(sc, populationSize, chromosomeLength)

I was exploring random RDDs in spark but could find something that meets my needs . 我当时正在探索随机的RDD,但可以找到满足我需求的东西。

Not really sure if this is what you are looking for, but with this code you are able to create an RDD array with random numbers between 0 and 1s: 不太确定这是否是您要查找的内容,但是使用此代码,您可以创建随机数介于0到1s之间的RDD数组:

import scala.util.Random

val arraySize = 15 // Total number of elements that you want
val numberOfOnes = 10 // From that total, how many do you want to be ones
val listOfOnes = List.fill(numberOfOnes)(1) // List of 1s
val listOfZeros = List.fill(arraySize - numberOfOnes)(0) // Rest list of 0s
val listOfOnesAndZeros = listOfOnes ::: listOfZeros // Merge lists
val randomList = Random.shuffle(listOfOnesAndZeros) // Random shuffle
val randomRDD = sc.parallelize(randomList) // RDD creation
randomRDD.collect() // Array[Int] = Array(1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1)

Or, if you want to use only RDDs: 或者,如果您只想使用RDD:

val arraySize = 15
val numberOfOnes = 10

val rddOfOnes = spark.range(numberOfOnes).map(_ => 1).rdd
val rddOfZeros = spark.range(arraySize - numberOfOnes).map(_ => 0).rdd
val rddOfOnesAndZeros = rddOfOnes ++ rddOfZeros
val shuffleResult = rddOfOnesAndZeros.mapPartitions(iter => {
  val rng = new scala.util.Random()
  iter.map((rng.nextInt, _))
}).partitionBy(new org.apache.spark.HashPartitioner(rddOfOnesAndZeros.partitions.size)).values

shuffleResult.collect() // Array[Int] = Array(0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1)

Let me know if it was what you need it. 让我知道您是否需要它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM