[英]Spark RDD transformation issue
I have data in this format: 我有这种格式的数据:
100 1 2 3 4 5 100 1 2 3 4 5
I use the following code to load it: 我使用以下代码加载它:
val data : RDD[(String, Array[Int])] = sc.textFile("data.txt").map(line => ((line.split("\t"))(0), (line.split("\t"))(1).split(" ").map(_.toInt)))
I want to generate pairs from the Array[Int] such that an array element with value more than a number (2 in the following code) gets paired up with all other elements of the array. 我想从Array [Int]生成对,以使值大于数字(以下代码中为2)的数组元素与该数组的所有其他元素配对。 I will then use that for generating further stats.
然后,我将使用它来生成进一步的统计信息。 For example with the sample data, I should be able to generate this first:
例如,使用样本数据,我应该能够首先生成此数据:
100 (3,1), (3,2), (3,4), (3,5),(4,1), (4,2), (4,3), (4,5) 100(3,1),(3,2),(3,4),(3,5),(4,1),(4,2),(4,3),(4,5)
val test = merged_data.mapValues { case x =>
for (element <- x) {
val y = x.filter(_ != element)
if (element > 2)
{
for (yelement <- y)
{
(element, yelement)
}
}
}
}
Here is the o/p that I get: Array[(String, Unit)] = Array((100,())) Not sure why it is empty.
这是我得到的o / p:Array [(String,Unit)] = Array((100,()))不知道为什么它为空。
Once I am able to resolve this, I will then sort the elements in the tuple and remove duplicates if any so the above o/p 一旦能够解决此问题,我将对元组中的元素进行排序,并删除重复项(如果有的话),即上述o / p
100 (3,1), (3,2), (3,4), (3,5),(4,1), (4,2), (4,3), (4,5) 100(3,1),(3,2),(3,4),(3,5),(4,1),(4,2),(4,3),(4,5)
becomes this: 成为这个:
100 (1,3), (2,3), (3,4), (3,5), (1,4), (2,4), (4,5) 100(1,3),(2,3),(3,4),(3,5),(1,4),(2,4),(4,5)
I was able to resolve this as: 我能够解决这个问题:
val test = merged_data.mapValues { case x =>
var sb = new StringBuilder
for (element <- x) {
val y = x.filter(_ != element)
if (element > 2)
{
for (yelement <- y)
{
(element, yelement)
}
}
}
sb.toString()
}
How about something like: 怎么样的:
val test = data.mapValues { x =>
for {
element <- x.filter(_ > 2);
yelement <- x.filter(_ != element)
} yield (element, yelement)
}
Also you might want to check out: Nested iteration in Scala , which answers why you got an empty result. 另外,您可能想看看: Scala中的嵌套迭代 ,它回答了为什么得到空结果的原因。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.