简体   繁体   English

火花RDD过滤器内部循环

[英]loop inside spark RDD filter

I am new to Spark and am trying to code in scala. 我是Spark的新手,正在尝试在Scala中进行编码。 I have an RDD which consists of data in the form : 我有一个RDD,其中包含以下形式的数据:

1: 2 3 5
2: 5 6 7 
3: 1 8 9
4: 1 2 4

and another list in the form [1,4,8,9] 以及另一个格式为[1,4,8,9]的列表

I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list. 我需要对RDD进行过滤,以使其采用那些在列表中出现“:”之前的值或在列表中出现“:”之后的值的行。

I have written the following code: 我写了以下代码:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        for(x<-l(0).split(" ")){
                            root.contains(x.toInt)
                        }
                    })

linksFile is the RDD and root is the list. linksFile是RDD,而root是列表。

But this doesn't work. 但这是行不通的。 any suggestions?? 有什么建议么??

You're close: the for-loop just doesn't actually use the value computed inside it. 您接近了:for循环实际上并没有使用其中计算的值。 You should use the exists method instead. 您应该改为使用exists方法。 Also I think you want l(1) , not l(0) for the second check: 我也认为您想要l(1) ,而不是l(0)用于第二次检查:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        l(1).split(" ").exists { x =>
                            root.contains(x.toInt)
                        }
                    })

For-comprehension without a yield doesn't ... well ... yield :) But you don't really need for-comprehension (or any "loop" for that matter) here. 没有yield的理解并不会... ...屈服:)但是,您在这里实际上并不需要理解(或任何“循环”)。

Something like this: 像这样:

linksFile.map(
   _.split(": ").map(_.toInt)
 ).filter(_.exits(list.toSet))
  .map(_.mkString)

should do it. 应该这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM