loop inside spark RDD filter

Question

I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :

and another list in the form [1,4,8,9]

I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.

I have written the following code:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        for(x<-l(0).split(" ")){
                            root.contains(x.toInt)
                        }
                    })

linksFile is the RDD and root is the list.

But this doesn't work. any suggestions??

Answer 1

You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1) , not l(0) for the second check:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        l(1).split(" ").exists { x =>
                            root.contains(x.toInt)
                        }
                    })

Answer 2

For-comprehension without a yield doesn't ... well ... yield :) But you don't really need for-comprehension (or any "loop" for that matter) here.

Something like this:

linksFile.map(
   _.split(": ").map(_.toInt)
 ).filter(_.exits(list.toSet))
  .map(_.mkString)

should do it.

loop inside spark RDD filter

Question

2 answers

solution1
2 ACCPTED 2017-09-25 22:51:05

solution2
0 2017-09-25 23:35:22

loop inside spark RDD filter

Question

2 answers

solution1 2 ACCPTED 2017-09-25 22:51:05

solution2 0 2017-09-25 23:35:22

solution1
2 ACCPTED 2017-09-25 22:51:05

solution2
0 2017-09-25 23:35:22