[英]Few conditions filter Apache Spark
I need to a check a few conditions, so I filtered my RDD this way: 我需要检查一些条件,因此我以这种方式过滤了RDD:
scala> file.filter(r => r(38)=="0").filter(r => r(2)=="0").filter(r => r(3)=="0").count
Is it correct as an alternative of "&&"? 替代“ &&”是否正确?
Yes, a series of filters is semantically equivalent to one filter with &&
in your case. 是的,在您的情况下,一系列过滤器在语义上等同于带有
&&
一个过滤器。
file.filter(r => r(38) == "0" && r(2) == "0" && r(3) == "0")
However, the variant above is guaranteed to be faster than the earlier version. 但是,可以保证上述变体比早期版本要快。 This can be established via the following:
这可以通过以下方式建立:
&&
is a short circuit operator, and the next comparison happens only if the first one evaluates to true
. &&
是短路运算符,只有在第一个比较结果为true
时,才会进行下一个比较。 The number of comparisons in both the cases will be the same (yes!). 两种情况下的比较次数将相同 (是!)。
The multiple filter version involves three passes over the RDD vs. one pass for a single filter with &&
. 多过滤器版本涉及RDD上的三遍,而使用
&&
的单个过滤器则为一遍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.