[英]loop inside spark RDD filter
I am new to Spark and am trying to code in scala. 我是Spark的新手,正在尝试在Scala中进行编码。 I have an RDD which consists of data in the form :
我有一个RDD,其中包含以下形式的数据:
1: 2 3 5 2: 5 6 7 3: 1 8 9 4: 1 2 4
and another list in the form [1,4,8,9] 以及另一个格式为[1,4,8,9]的列表
I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list. 我需要对RDD进行过滤,以使其采用那些在列表中出现“:”之前的值或在列表中出现“:”之后的值的行。
I have written the following code: 我写了以下代码:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
for(x<-l(0).split(" ")){
root.contains(x.toInt)
}
})
linksFile is the RDD and root is the list. linksFile是RDD,而root是列表。
But this doesn't work. 但这是行不通的。 any suggestions??
有什么建议么??
You're close: the for-loop just doesn't actually use the value computed inside it. 您接近了:for循环实际上并没有使用其中计算的值。 You should use the
exists
method instead. 您应该改为使用
exists
方法。 Also I think you want l(1)
, not l(0)
for the second check: 我也认为您想要
l(1)
,而不是l(0)
用于第二次检查:
val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
l(1).split(" ").exists { x =>
root.contains(x.toInt)
}
})
For-comprehension without a yield
doesn't ... well ... yield :) But you don't really need for-comprehension (or any "loop" for that matter) here. 没有
yield
的理解并不会... ...屈服:)但是,您在这里实际上并不需要理解(或任何“循环”)。
Something like this: 像这样:
linksFile.map(
_.split(": ").map(_.toInt)
).filter(_.exits(list.toSet))
.map(_.mkString)
should do it. 应该这样做。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.