[英]Multiple filter condition in Spark Filter method
如何写在多个情况下, filter()
使用阶像在火花方法,我的放射性散布cogroup
(1,(CompactBuffer(1,john,23),CompactBuffer(1,john,24)).filter(x => (x._2._1 != x._2._2))//value not equal
(2,(CompactBuffer(),CompactBuffer(2,Arun,24)).filter(x => (x._2._1==null))//Second tuple first value is null
(3,(CompactBuffer(3,kumar,25),CompactBuffer()).filter(x => (x._2._2==null))//Second tuple second value is null
val a = source_primary_key.cogroup(destination_primary_key).filter(x => (x._2._1 != x._2._2))
val c= a.map { y =>
val key = y._1
val value = y._2
srcs = value._1.mkString(",")
destt = value._2.mkString(",")
if (srcs.equalsIgnoreCase(destt) == false) {
srcmis :+= srcs
destmis :+= destt
}
if (srcs == "") {
extraindest :+= destt.mkString("")
}
if (destt == "") {
extrainsrc :+= srcs.mkString("")
}
}
如何将每个条件存储在3个不同的Array [String]中
我像上面一样尝试过,但看起来很幼稚,无论如何我们可以有效地做到这一点吗?
为了测试,我创建了以下rdds
val source_primary_key = sc.parallelize(Seq((1,(1,"john",23)),(3,(3,"kumar",25))))
val destination_primary_key = sc.parallelize(Seq((1,(1,"john",24)),(2,(2,"arun",24))))
然后我像你一样cogrouped
在一起
val coGrouped = source_primary_key.cogroup(destination_primary_key)
现在是将共同分组的rdd
filter
为三个单独的rdds
的步骤, rdds
:
val a = coGrouped.filter(x => !x._2._1.isEmpty && !x._2._2.isEmpty)
val b = coGrouped.filter(x => x._2._1.isEmpty && !x._2._2.isEmpty)
val c = coGrouped.filter(x => !x._2._1.isEmpty && x._2._2.isEmpty)
我希望答案是有帮助的
您可以在RDD上使用collect
,然后使用toList
。 范例:
(1,(CompactBuffer(1,john,23),CompactBuffer(1,john,24)).filter(x => (x._2._1 != x._2._2)).collect().toList
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.