火花RDD过滤器内部循环

Question

I am new to Spark and am trying to code in scala. 我是Spark的新手，正在尝试在Scala中进行编码。 I have an RDD which consists of data in the form : 我有一个RDD，其中包含以下形式的数据：

and another list in the form [1,4,8,9] 以及另一个格式为[1,4,8,9]的列表

I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list. 我需要对RDD进行过滤，以使其采用那些在列表中出现“：”之前的值或在列表中出现“：”之后的值的行。

I have written the following code: 我写了以下代码：

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        for(x<-l(0).split(" ")){
                            root.contains(x.toInt)
                        }
                    })

linksFile is the RDD and root is the list. linksFile是RDD，而root是列表。

But this doesn't work. 但这是行不通的。 any suggestions?? 有什么建议么？？

Answer 1

You're close: the for-loop just doesn't actually use the value computed inside it. 您接近了：for循环实际上并没有使用其中计算的值。 You should use the exists method instead. 您应该改为使用exists方法。 Also I think you want l(1) , not l(0) for the second check: 我也认为您想要l(1) ，而不是l(0)用于第二次检查：

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        l(1).split(" ").exists { x =>
                            root.contains(x.toInt)
                        }
                    })

Answer 2

For-comprehension without a yield doesn't ... well ... yield :) But you don't really need for-comprehension (or any "loop" for that matter) here. 没有yield的理解并不会... ...屈服:)但是，您在这里实际上并不需要理解（或任何“循环”）。

Something like this: 像这样：

linksFile.map(
   _.split(": ").map(_.toInt)
 ).filter(_.exits(list.toSet))
  .map(_.mkString)

should do it. 应该这样做。

火花RDD过滤器内部循环

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-09-25 22:51:05

解决方案2
0 2017-09-25 23:35:22

火花RDD过滤器内部循环

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-09-25 22:51:05

解决方案2 0 2017-09-25 23:35:22

解决方案1
2 已采纳 2017-09-25 22:51:05

解决方案2
0 2017-09-25 23:35:22