简体   繁体   中英

loop inside spark RDD filter

I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :

1: 2 3 5
2: 5 6 7 
3: 1 8 9
4: 1 2 4

and another list in the form [1,4,8,9]

I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.

I have written the following code:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        for(x<-l(0).split(" ")){
                            root.contains(x.toInt)
                        }
                    })

linksFile is the RDD and root is the list.

But this doesn't work. any suggestions??

You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1) , not l(0) for the second check:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        l(1).split(" ").exists { x =>
                            root.contains(x.toInt)
                        }
                    })

For-comprehension without a yield doesn't ... well ... yield :) But you don't really need for-comprehension (or any "loop" for that matter) here.

Something like this:

linksFile.map(
   _.split(": ").map(_.toInt)
 ).filter(_.exits(list.toSet))
  .map(_.mkString)

should do it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM